OpenAI has unveiled three new voice models for developers, enabling the creation of voice assistants capable of real-time conversation and translation.

OpenAI's real-time speech API model 'GPT-RealTime-Translate' translates conversations conducted in German and French in real time. OpenAI.

OpenAI's real-time speech API model 'GPT-RealTime-Translate' translates conversations conducted in German and French in real time. OpenAI.

View original image

On May 7 (local time), OpenAI released its voice application programming interface (API) models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Each serves a different function: conversational, translation, and speech-to-text, respectively. All three models respond to user voice input in real time. OpenAI explained that these models not only excel in speed and naturalness but also understand user intent and context, offering responses tailored to the situation.


GPT-Realtime-2 is a voice model with inference capabilities, designed for real-time voice interactions. It demonstrates high domain comprehension, including specialized terminology and proper nouns, and can adjust its tone according to different contexts.


GPT-Realtime-Translate supports translation of over 70 languages into 13 target languages, creating a multilingual voice environment. Because it translates in real time to match the speaker's pace, it is expected to be highly useful for platforms that produce content for global audiences. GPT-Realtime-Whisper utilizes speech-to-text technology, which can assist in generating subtitles or meeting transcripts.


All three models are available through OpenAI's Realtime API. GPT-Realtime-2 is priced at $32 per 1 million audio input tokens and $64 per 1 million output tokens. GPT-Realtime-Translate and GPT-Realtime-Whisper cost $0.034 per minute and $0.017 per minute, respectively.



With the introduction of these models, the adoption of voice agents in enterprises is expected to accelerate, leading to the proliferation of voice-based interfaces for task instructions. OpenAI stated, "We have introduced three audio models in our API to enable developers to build a new dimension of voice apps. This will allow for more natural, intelligent, and real-time responsive voice environments." According to market research firm Grand View Research, the global voice agent market size is projected to grow at an average annual rate of 39% from 2026 to 2033, reaching $35.24 billion (approximately 52 trillion won).


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing