What does your Android keyboard tell strangers?

We explore whether it’s possible to reveal all your secrets via your smartphone’s on-screen keyboard.

Is it possible to spy on keystrokes from an Android on-screen keyboard?

“Hackers can spy on every keystroke of Honor, OPPO, Samsung, Vivo, and Xiaomi smartphones over the internet” – alarming headlines like this have been circulating in the media over the past few weeks. Their origin was a rather serious study on vulnerabilities in keyboard traffic encryption. Attackers who are able to observe network traffic, for example, through an infected home router, can indeed intercept every keystroke and uncover all your passwords and secrets. But don’t rush to trade in your Android for an iPhone just yet – this only concerns Chinese language input using the pinyin system, and only if the “cloud prediction” feature is enabled. Nevertheless, we thought it would be worth investigating the situation with other languages and keyboards from other manufacturers.

Why many pinyin keyboards are vulnerable to eavesdropping

The pinyin writing system, also known as the Chinese phonetic alphabet, helps users write Chinese words using Latin letters and diacritics. It’s the official romanization system for the Chinese language, adopted by the UN among others. Drawing Chinese characters on a smartphone is rather inconvenient, so the pinyin input method is very popular, used by over a billion people, according to some estimates. Unlike many other languages, word prediction for Chinese, especially in pinyin, is difficult to implement directly on a smartphone – it’s a computationally complex task. Therefore, almost all keyboards (or more precisely, input methods – IMEs) use “cloud prediction”, meaning they instantaneously send the pinyin characters entered by the user to a server and receive word completion suggestions in return. Sometimes the “cloud” function can be turned off, but this reduces the speed and quality of the Chinese input.

To predict the text entered in pinyin, the keyboard sends data to the server

To predict the text entered in pinyin, the keyboard sends data to the server

Of course, all the characters you type are accessible to the keyboard developers due to the “cloud prediction” system. But that’s not all! Character-by-character data exchange requires special encryption, which many developers fail to implement correctly. As a result, all keystrokes and corresponding predictions can be easily decrypted by outsiders.

You can find details about each of the errors found in the original source, but overall, of the nine keyboards analyzed, only the pinyin IME in Huawei smartphones had correctly implemented TLS encryption and resisted attacks. However, IMEs from Baidu, Honor, iFlytek, OPPO, Samsung, Tencent, Vivo, and Xiaomi were found to be vulnerable to varying degrees, with Honor’s standard pinyin keyboard (Baidu 3.1) and QQ pinyin failing to receive updates even after the researchers contacted the developers. Pinyin users are advised to update their IME to the latest version, and if no updates are available, to download a different pinyin IME.

Do other keyboards send keystrokes?

There is no direct technical need for this. For most languages, word and sentence endings can be predicted directly on the device, so popular keyboards don’t require character-by-character data transfer. Nevertheless, data about entered text may be sent to the server for personal dictionary synchronization between devices, for machine learning, or for other purposes not directly related to the primary function of the keyboard – such as advertising analytics.

Whether you want such data to be stored on Google and Microsoft servers is a matter of personal choice, but it’s unlikely that anyone would be interested in sharing it with outsiders. At least one such incident was publicized in 2016 – the SwiftKey keyboard was found to be predicting email addresses and other personal dictionary entries of other users. After the incident, Microsoft temporarily disabled the synchronization service, presumably to fix the errors. If you don’t want your personal dictionary stored on Microsoft’s servers, don’t create a SwiftKey account, and if you already have one, deactivate it and delete the data stored in the cloud by following these instructions.

There have been no other widely known cases of typed text being leaked. However, research has shown that popular keyboards actively monitor metadata as you type. For example, Google’s Gboard and Microsoft’s SwiftKey send data about every word entered: language, word length, the exact input time, and the app in which the word was entered. SwiftKey also sends statistics on how much effort was saved: how many words were typed in full, how many were automatically predicted, and how many were swiped. Considering that both keyboards send the user’s unique advertising ID to the “headquarters”, this creates ample opportunity for profiling – for example, it becomes possible to determine which users are corresponding with each other in any messenger.

If you create a SwiftKey account and don’t disable the “Help Microsoft improve” option, then according to the privacy policy, “small samples” of typed text may be sent to the server. How this works and the size of these “small samples” is unknown.

"Help Microsoft improve"... what? Collecting your data?

“Help Microsoft improve”… what? Collecting your data?

Google allows you to disable the “Share Usage Statistics” option in Gboard, which significantly reduces the amount of information transmitted: word lengths and apps where the keyboard was used are no longer included.

Disabling the "Share Usage Statistics" option in Gboard significantly reduces the amount of information collected

Disabling the “Share Usage Statistics” option in Gboard significantly reduces the amount of information collected

In terms of cryptography, data exchange in Gboard and SwiftKey did not raise any concerns among the researchers, as both apps rely on the standard TLS implementation in the operating system and are resistant to common cryptographic attacks. Therefore, traffic interception in these apps is unlikely.

In addition to Gboard and SwiftKey, the authors also analyzed the popular AnySoftKeyboard app. It fully lived up to its reputation as a keyboard for privacy diehards by not sending any telemetry to servers.

Is it possible for passwords and other confidential data to leak from a smartphone?

An app doesn’t have to be a keyboard to intercept sensitive data. For example, TikTok monitors all data copied to the clipboard, even though this function seems unnecessary for a social network. Malware on Android often activates accessibility features and administrator rights on smartphones to capture data from input fields and directly from files of “interesting” apps.

On the other hand, an Android keyboard can “leak” not only typed text. For example, the AI.Type keyboard caused a data leak for 31 million users. For some reason, it collected data such as phone numbers, exact geolocations, and even the contents of address books.

How to protect yourself from keyboard and input field spying

  • Whenever possible, use a keyboard that doesn’t send unnecessary data to the server. Before installing a new keyboard app, search the web for information about it – if there have been any scandals associated with it, it will show up immediately.
  • If you’re more concerned about the keyboard’s convenience than its privacy (we don’t judge, the keyboard is important), go through the settings and disable the synchronization and statistics transfer options wherever possible. These may be hidden under various names, including “Account”, “Cloud”, “Help us improve”, and even “Audio donations”.
  • Check which Android permissions the keyboard needs and revoke any that it doesn’t need. Access to contacts or the camera is definitely not necessary for a keyboard.
  • Only install apps from trusted sources, check the app’s reputation, and, again, don’t give it excessive permissions.
  • Use comprehensive protection for all your Android and iOS smartphones, such as Kaspersky Premium.
Tips