A UK-based research team trained an artificial intelligence (AI) to decode keystroke sounds from Zoom audio with surprising accuracy rate of up to 93%, revealing typed content. However, as Futurism reported, this discovery has substantial cybersecurity concerns.
The team emphasized the prevalence of microphone-equipped devices and emergence of audio-focused cyberattacks. When combined with deep learning advancements, it poses signifcant risk that could potentially expose sensitive data to malicious individuals.
Training AI to Recognize Keystroke Sounds
In their paper titled "A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards," the team asserts that the combination of widespread machine learning, microphones, and video calls poses an increased threat to keyboards.
Specifically, laptops are more vulnerable to having their keystrokes captured in quieter public environments like coffee shops, libraries, or offices, particularly due to the uniform, non-modular keyboard designs commonly found in most laptops.
Previous endeavors to log keystrokes in Voice over Internet Protocol (VoIP) calls, without physical access to the target, achieved a top-5 accuracy of 91.7% over Skype in 2017 and 74.3% accuracy in VoIP calls in 2018 ARS Technica reported.
Combining keystroke interpretations with a "hidden Markov model" (HMM), which predicts likely next-letter outcomes and can correct typographical errors, significantly improved accuracy in a previous side channel study, increasing from 72% to 95%. Researchers believe their paper pioneers the utilization of recent advancements in neural network technology to enable an audio-based side channel attack.
To test their concept, the researchers used a 2021 MacBook Pro with a keyboard design similar to models from the past two years. They typed on 36 keys 25 times each to train their model on the associated waveforms.
For testing, they recorded the keyboard's audio using an iPhone 13 mini positioned 17 cm away in the first test, and in the second test, they recorded the laptop keys over Zoom, utilizing the MacBook's built-in microphones with minimal noise suppression. In both tests, they achieved accuracy exceeding 93 percent, with the phone-recorded audio approaching 95-96 percent accuracy.
The researchers observed that a key's position played a significant role in determining its unique audio profile, with most misclassifications occurring only one or two keys away. Due to this pattern, there appears to be potential for a secondary machine-assisted system to rectify such errors, leveraging a substantial language dataset and approximate key locations.
READ ALSO : Preventing AI Takeover: Microsoft, IBM, Nvidia Join Campaign to Protect Us From AI Threats
Mitigating These Kind of Cyberattacks
The program achieved 95% and 93% accuracy in reading tests on different mediums, outperforming similar past readers. But even so, co-author Ehsan Toreini said in an interview that there will be growing accuracy and concerns due to microphone-equipped devices.
To mitigate these kind of cyberattacks, the paper suggested the following:
- Modifying the typing style, especially touch typing, which is harder for these attacks to identify accurately.
- Employing randomized passwords with mixed cases, as these attacks struggle to detect shift key "release peaks."
- Injecting randomly generated fake keystrokes into transmitted audio during video calls, though this might impact usability.
- Utilizing biometric tools such as fingerprint or face scanning instead of typed passwords.
RELATED ARTICLE: Quantum Technology Set to Transform Cybersecurity, Provide Superfast Computing Capability
Check out more news and information on Cybersecurity in Science Times.