"Seamless Conversational Voice Recognition"... OpenAI Unveils Three AI Models for Creating Voice Assistants
OpenAI has unveiled three new voice models for developers, enabling the creation of voice assistants capable of real-time conversation and translation.
OpenAI's real-time speech API model 'GPT-RealTime-Translate' translates conversations conducted in German and French in real time. OpenAI.
View original imageOn May 7 (local time), OpenAI released its voice application programming interface (API) models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each serves a different function: conversational, translation, and speech-to-text, respectively. All three models respond to user voice input in real time. OpenAI explained that these models not only excel in speed and naturalness but also understand user intent and context, offering responses tailored to the situation.
GPT-Realtime-2 is a voice model with inference capabilities, designed for real-time voice interactions. It demonstrates high domain comprehension, including specialized terminology and proper nouns, and can adjust its tone according to different contexts.
GPT-Realtime-Translate supports translation of over 70 languages into 13 target languages, creating a multilingual voice environment. Because it translates in real time to match the speaker's pace, it is expected to be highly useful for platforms that produce content for global audiences. GPT-Realtime-Whisper utilizes speech-to-text technology, which can assist in generating subtitles or meeting transcripts.
All three models are available through OpenAI's Realtime API. GPT-Realtime-2 is priced at $32 per 1 million audio input tokens and $64 per 1 million output tokens. GPT-Realtime-Translate and GPT-Realtime-Whisper cost $0.034 per minute and $0.017 per minute, respectively.
Hot Picks Today
"Most Americans Didn't Want This"... Americans Lose 60 Trillion Won to Soaring Fuel Costs
- "Striking Will Lead to Regret": Hyundai-Kia Employees Speak Out... Uneasy Stares Toward Samsung Union
- Man in His 40s Who Kept Girlfriend's Body for a Year After Murder Sentenced to 30 Years in Prison Again on Appeal
- "If You Booked This Month, You Almost Lost Out... Why You Should Wait Until 'This Day' Before Paying for Flight Tickets"
- "Why Make Things Like This?" Foreign Media Highlights Bizarre Phenomenon Spreading in Korea
With the introduction of these models, the adoption of voice agents in enterprises is expected to accelerate, leading to the proliferation of voice-based interfaces for task instructions. OpenAI stated, "We have introduced three audio models in our API to enable developers to build a new dimension of voice apps. This will allow for more natural, intelligent, and real-time responsive voice environments." According to market research firm Grand View Research, the global voice agent market size is projected to grow at an average annual rate of 39% from 2026 to 2033, reaching $35.24 billion (approximately 52 trillion won).
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.