After Answering Customer's Call, 49.9 Billion KRW Stolen... More Sophisticated 'Deep Voice Phishing' [Deepfake Fear Targeting Individuals and Businesses③]

Deep Voice Can Be Created with Only About Five Seconds of Video or Audio
Infiltrates Personal and Corporate Customer Consultation Services... Detection Technology Needed

Phishing crimes exploiting deep voice technology are becoming more sophisticated, highlighting the need to expand deep voice detection technology. Provided by Shutterstock

Phishing crimes exploiting deep voice technology are becoming more sophisticated, highlighting the need to expand deep voice detection technology. Provided by Shutterstock

원본보기 아이콘

#A branch manager of a bank in the United Arab Emirates transferred $35 million (approximately 49.9 billion KRW) after receiving a call from a senior executive of a major client he had been dealing with, but the voice was a deep voice phishing impersonating the executive.


#A woman in her 60s, Ms. A, received a call from her daughter saying, “I stood as a guarantor for a friend, but the friend is unreachable, so I was taken in.” Shortly after, she received a threatening call demanding money. Ms. A rushed to the bank and withdrew 20 million KRW in cash. Fortunately, a suspicious bank employee reported the incident, leading to the arrest of the criminal. Ms. A could not suspect it was voice phishing because she heard her daughter’s voice manipulated by deep voice technology.


As generative artificial intelligence (AI) advances, ‘deep voice’ crimes?where false information is reproduced using deep learning technology in someone else’s voice?have emerged as a serious social problem. This is because not only celebrities whose voices are publicly exposed but also ordinary people’s voices recorded through personal videos or phone calls are increasingly at risk of being exploited in deep voice crimes.


Deep voice criminals replicate the voices of celebrities appearing in broadcast videos or videos posted on video-sharing sites and synthesize fabricated content. With current technology, deep voice can be easily produced from just about five seconds of video containing another person’s voice.


Since it has become common for ordinary people to post videos featuring themselves online, anyone can become a target of deep voice crimes. Moreover, the fact that one’s voice can be recorded and used for crimes just through phone calls has raised greater awareness.


Deep voice is an efficient tool for content creation but is also misused in sophisticated voice phishing crimes. According to the National Police Agency, voice phishing damages amounted to 774.4 billion KRW in 2021, decreased to 543.8 billion KRW in 2022 and 447.2 billion KRW in 2023, but surged to 854.5 billion KRW last year.


Voice phishing using unfamiliar voices has become widely known compared to several years ago, and damage has decreased due to phishing detection applications (apps). However, experts analyze that the rapid development of deep voice technology and the increase in voice phishing crimes using sophisticated methods that imitate the voices of family or acquaintances caused the surge in damages last year.


Phishing using deep voice can involve issuing instructions that harm companies by requesting confidential information inquiries or changes to transaction accounts in the voice of a client company executive or the company’s CEO, or by pretending to be a beloved family member such as a child to demand money. It can also mimic ordinary people’s voices to attack corporate customer service centers. For example, criminals can call financial institutions or telecommunications companies using someone else’s voice to disrupt information or commit financial fraud.


This deep voice technology has reached a stage where it can precisely replicate not only voice tone but also intonation, and companies that produce deep voice content can already be found in dozens or more through simple online searches.


Accordingly, there is a common industry call for the expansion of deep voice detection technology. As the risk of crimes abusing advanced deep voice technology increases, the need to introduce detection technology that can distinguish finely manipulated voices across all industries is emerging. In response to this situation, a domestic security company has begun developing such technology.


IT security certification platform company RaonSecure has incorporated a function to detect deepfake videos or images through AI technology into its personal mobile antivirus app, ‘Raon Mobile Security,’ and is currently developing AI-based deep voice detection technology.


RaonSecure plans to provide the deep voice detection function to companies conducting customer consultation services. Through this technology, companies can protect corporate information and assets from calls impersonating CEOs or executives and fake customer consultations. In particular, companies handling large volumes of personal customer information are reportedly planning to allocate budgets of tens of billions of KRW to protect customer information from deepfake and deep voice crimes.


An expert in the security industry said, “Crimes abusing deep voice are becoming increasingly sophisticated with AI technology, and there is growing awareness among individuals and companies that even calls from acquaintances, executives, or customers cannot be safely trusted.” He added, “The technology to effectively prevent this is also AI technology trained on deep voice, and to protect personal and corporate information and assets and ultimately national competitiveness, the expansion and continuous research and development of deep voice detection technology are necessary.”

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.