Create an AI Voice with Your Own Voice by Recording Just 20 Minutes

Published 23 Sep.2020 08:35(KST)

Updated 22 Mar.2023 16:41(KST)

open/close

SK C&C-LionRocket Sign 'AI Text-to-Speech Business Cooperation Agreement'

Create an AI Voice with Your Own Voice by Recording Just 20 Minutes

[Asia Economy Reporter Jin-gyu Lee] It has become possible to create an artificial intelligence (AI) voice that resembles one’s own voice with only about 20 minutes of audio recording.

SK C&C Co., Ltd. (hereinafter SK C&C) signed a 'AI Text-to-Speech Conversion Business Cooperation Agreement (MOU)' with Ryan Rocket on the 23rd. Ryan Rocket is a startup that possesses voice synthesis technology that perfectly mimics the speaker’s voice and speech style using AI.

The two companies will combine natural language processing and proprietary text-to-speech conversion technology to jointly explore domestic and international content markets, focusing on the cultural and educational sectors. They will prioritize a project that converts printed books into audio to assist the learning and cultural experiences of visually impaired individuals. The two companies supported the production of narration for a braille sensory book introducing natural monuments for the visually impaired, in collaboration with the Cultural Heritage Administration’s National Research Institute of Cultural Heritage.

Using AI, the voices of actors Lee Byung-hun and Han Ji-min, who expressed their willingness to donate their voices for the visually impaired, were reproduced in the narration. The natural monument narration will be included in the 'Natural Monument Multimedia Braille Sensory Book,' scheduled for release on October 15, the Day of the Visually Impaired, and distributed to visually impaired organizations and schools for the blind nationwide. It will also be made available through the 'Mobile Sound Book' application operated by the Korea Welfare Foundation for the Visually Impaired.

The core of the AI text-to-speech conversion collaboration lies in the fast voice synthesis speed based on deep learning and DeepVoice technology. With only 20 minutes of voice recording data, it analyzes the speaker’s voice, tone, intonation, pronunciation, and speed characteristics to naturally read any sentence. Unlike existing text-to-speech conversion technology that breaks down speech into consonants and vowels and then attaches sounds, the AI analyzes and learns the corpus as is.

The two companies will also join forces to develop new businesses such as global content projects utilizing Hallyu stars and characters. Thanks to the short production period, the time invested in commercialization is expected to be significantly reduced. Since Hallyu stars or famous characters who already have recording files can generate voices simultaneously with text input without additional recording. Lee Seok-jin, head of SK C&C Channel & Marketing Group, said, "Through this cooperation, we will strive to create various social value practice programs that can contribute to improving education and cultural life for the visually impaired."