In late December 2022, a team of scientists from several US universities published a paper on wiretapping. The eavesdropping method they explore is rather unusual: words spoken by the person you’re talking to on your smartphone reproduced through your phone’s speaker can be picked up by a built-in sensor known as the accelerometer. At first glance, this approach doesn’t seem to make sense: why not just intercept the audio signal itself or the data? The fact is that modern smartphone operating systems do an excellent job of protecting phone conversations, and in any case most apps don’t have permission to record sound during calls. But the accelerometer is freely accessible, which opens up new methods of surveillance. This is a type of side-channel attack, one that so far, fortunately, remains completely theoretical. But, over time, such research could make non-standard wiretapping a reality.
An accelerometer is a special sensor for measuring acceleration; together with another sensor, a gyroscope, it helps to detect changes in the position of the phone it resides on. Accelerometers have been built into all smartphones for more than a decade now. Among other things, they rotate the image on the screen when you turn your phone round. Sometimes they are used in games or, say, in augmented reality apps, when the image from the phone’s camera is superimposed with some virtual elements. Step-counters work by tracking phone vibrations as the user walks. And if you flip your phone to mute an incoming call, or tap the screen to wake up the device, these actions too are picked up by the accelerometer.
How can this standard yet “invisible” sensor eavesdrop on your conversations? When the other person speaks, their voice is played through the built-in speaker, causing it, and the body of the smartphone, to vibrate. It turns out that the accelerometer is sensitive enough to detect these vibrations. Although researchers have known about this for some time, the tiny size of these vibrations ruled out full-fledged wiretapping. But in recent years, the situation has changed for the better for the worse: smartphones now boast more powerful speakers. Why? To improve the volume and sound quality when you’re watching a video, for example. A byproduct of this is better sound quality during phone calls since they use the same speaker. The U.S. team of scientists clearly demonstrate this in their paper:
On the left is a relatively old smartphone of 2016 vintage, not equipped with powerful stereo speakers. In the center and on the right is a spectrogram from the accelerometer of a more modern device. In each case, the word “zero” is played six times through the speaker. With the old smartphone, the sound is barely reflected in the acceleration data; with the new one, a pattern emerges that roughly corresponds to the played words. The best result can be seen in the graph on the right, where the device is in loudspeaker mode. But even during a normal conversation, with the phone pressed to the ear, there is enough data for analysis. It turns out that the accelerometer acts as a microphone!
Let’s pause here to evaluate the difficulty of the task the researchers set for themselves. The accelerometer may act as a microphone, but a very, very poor one. Suppose we got the user to install malware that tries to eavesdrop on phone conversations, or we built a wiretapping module into a popular game. As mentioned above, our program doesn’t have permission to directly record conversations, but it can monitor the state of the accelerometer. The number of requests to this sensor is limited and depends on the specific model of both the sensor and the smartphone. For example, one of the phones in the study allowed 420 requests per second (measured in Hertz (Hz)), another — 520Hz. Starting with version 12, the Android operating system introduced a limit of 200Hz. Known as the sampling rate, this limits the frequency range of the resulting “sound recording”. It is half the sampling rate at which we can receive data from the sensor. This means that at best the researchers had access to the frequency range from 1 to 260Hz.
The frequency range for voice transmittance is from around 300 to 3400Hz, but what the accelerometer “overhears” is not a voice: if we try to play back this “recording” we get a murmuring noise that only remotely resembles the original sound. The researchers used machine learning to analyze these voice traces. They created a program that takes known samples of the human voice and compares them with data they captured from the accelerator. Such training further allows a voice recording of unknown content to be deciphered with a certain margin of error.
For researchers of wiretapping methods, this is all-too familiar. The authors of the new paper refer to a host of predecessors who have shown how to obtain voice data using the seemingly most unlikely of objects. Here’s a real example of a spying technique: from a nearby building, attackers direct an invisible laser beam at the window of the room where the conversation they want to eavesdrop on is taking place. The sound waves from the voices cause the window pane to vibrate ever so slightly, and this vibration is traceable in the reflected laser beam. And this data is sufficient to restore the content of a private conversation. Back in 2020, scientists from Israel showed how speech can be reconstructed from the vibrations of an ordinary light bulb. Sound waves cause small changes in its brightness, which can be detected at a distance of up to 25 meters. Accelerometer-based eavesdropping is very similar to these spying tricks, but with one important difference: The “bug” is already built into the device to be tapped.
Yes, but to what extent can the content of a conversation be recovered from accelerometer data? Although the new paper seriously improves the quality of wiretapping, the method cannot yet be called reliable. In 92% of cases, the accelerometer data made it possible to distinguish one voice from another. In 99% of cases, it was possible to correctly determine gender. Actual speech was recognized with an accuracy of 56% — half of the words could not be reconstructed. And the data set used in the test was extremely limited: just three people saying a number several times in succession.
What the paper did not cover was the ability to analyze the speech of the smartphone user. If we only hear the sound from the speaker, at best we have only half the conversation. When we press the phone to our ear, vibrations from our speech should also be felt by the accelerometer, but the quality is bound to be far worse than the vibrations from the speaker. This remains to be studied in more detail in new research.
Fortunately, the scientists were not looking to create a usable wiretapping device for the here and now. They were simply testing out new methods of privacy invasion that may one day become relevant. Such studies allow device manufacturers and software developers to proactively develop protection against theoretical threats. Incidentally, the 200Hz sampling rate limit introduced in Android 12 does not really help: the recognition accuracy in real experiments has decreased, but not by much. Far greater interference comes from the smartphone user naturally during a conversation: their voice, hand movements, general moving around. The researchers were unable to reliably filter out these vibrations from the useful signal.
The most important aspect of the study was the use of the smartphone’s built-in sensor: all previous methods relied on various additional tools, but here we have out-of-the-box eavesdropping. Despite the modest practical results, this interesting study shows how such a complex device as a smartphone is full of potential data breaches. On a related note, we recently wrote about how signals from Wi-Fi modules in phones, computers, and other devices unwittingly give away their location, how robot vacuum cleaners spy on their owners, and how IP cameras like to peep where they shouldn’t.
And while such surveillance methods are unlikely to threaten the average user, it would be nice if the technology of the future were armed against all risks of spying, eavesdropping, and sneaky peeking, however small. But since these cases involve malware being installed on your smartphone, you should always have the ability to trace and block it.