Voice deepfakes can't be detected by humans or security systems. Attacks on the rise

Spreading alarm messages or disclosing confidential company or bank data. Artificial intelligence is developing rapidly and almost anyone can create deepfake voice recordings at home and in high quality. Neither humans nor biometric systems can reliably distinguish artificial speech from real speech. Researchers from FIT BUT together with commercial system developers now want to design more reliable testing and more accurate detection of deepfakes. They are responding to a call from the Ministry of the Interior.

Anton Firc from FIT BUT first started to address the issue of deepfakes in his master's thesis, in which he investigated the resistance of voice biometrics to deepfakes. The same issue was then followed up by Daniel Prudky's research, which sent voice messages to 31 respondents and investigated their ability to detect deepfakes in ordinary conversation. "People were told a cover story about the user-friendliness of the voicemails being tested. He included one deepfake recording in the test conversations and monitored the respondents' reactions. The results showed that none of them experienced a fraudulent deepfake message," Firc explains.

However, in the same experiment, when respondents were told that one of the voicemails was a fake, they were able to identify it with almost 80% accuracy. "But the research showed that although a deepfake recording is easily identifiable among real ones, no one can detect it in a normal conversation," Firc adds. Part of the reason for this, he says, is that the interviewer didn't expect it in the context, and that's what the creators of deepfake recordings can exploit in reality.

"People don't expect to encounter a deepfake voice, and so are able to ignore even mistakes or poorer quality recordings. All phone and social network users are at risk. This opens up the possibility of vishing attacks, which is a combination of deepfake voice and phishing, on large numbers of people," the researcher adds, pointing out that raising general awareness may be a suitable protection.

[img]

Anyone who uses a phone, computer or has a social media account is at risk, he says. A common example of a social engineering attack is the disclosure of internal company information via a phone call. "The phone rings and you get a call from a colleague in another office. He knows the right phrasing and words and pretends that his computer is not working and he needs you to look into the system for him and maybe give him access data," Firc says.

Deepfakes expand the possibilities of these social engineering attacks. Even people without much technical knowledge can now create synthetic recordings in high quality at home. And voice biometrics systems that verify the identity of callers to banks or call centres cannot reliably distinguish synthetic recordings from real human speech. "I have tested two commercially available voice biometrics systems and confirmed that even they cannot distinguish a real recording from a synthetic one," says the researcher.

The biggest problem, he says, is that even the developers of biometric systems do not have a methodology to test the systems' resistance to deepfake attacks. "There are models, deepfakes detectors, based on neural networks that are able to detect if there are anomalies in the recording that are not found in normal speech and evaluate whether it is genuine or synthetic. But it is very challenging to explain what these models are really making decisions based on. The only thing that experts have discovered so far is that deepfake recordings have more energy in the higher frequencies, whereas in human speech this energy is more linearly distributed," the researcher points out, adding that detecting and properly testing deepfakes is still in its infancy.

While banks and private companies are currently the main targets of attacks, in the future ordinary people may also suffer from holes in cyber security.

[img]

"One Slovakian bank is willing to issue you a credit card based on voice verification alone. Since data leaks are common and it's no problem to buy someone's personal information, it will be very easy to apply for a credit card in another person's name using deepfake voice recordings. What's more, artificial intelligence is evolving so quickly that we will soon be able to automate these attacks and incorporate language models like ChatGPT. In a worst-case scenario, this could create an army of artificial telemarketers who will call elderly people and pretend they are, for example, family members, have been in a car accident and need to send money immediately," Firc outlines possible scenarios for the misuse of deepfake recordings in the future.

The issue of deepfakes in cybersecurity has also been taken up by the Ministry of the Interior of the Czech , which has launched a call for security research, on which Anton Firc (for the Security@FIT group) is working with Speech@FIT and Phonexia. The aim is to develop tools that can reliably identify artificially created recordings.

Author: Horná Petra, Mgr.

Last modified: 2024-02-01 10:19:24

Back to press releases