Four AI Models Unveiled, Including Speech-Enabled Large Language Model

On April 2, Krafton launched its artificial intelligence (AI) model brand "Raon" and released its language and speech models, as well as its vision encoder, as open source on the global platform Hugging Face.


Krafton's AI model brand Raon. Provided by Krafton

Krafton's AI model brand Raon. Provided by Krafton

View original image


Raon is inspired by the pure Korean word meaning "joy," and its English name is derived from part of the company's name. The brand embodies Krafton's philosophy of creating the essential enjoyment of gaming through AI technology, and the company plans to strengthen its global AI technology competitiveness with Raon at its core.


The open source models include Raon-Speech, Raon-SpeechChat, Raon-OpenTTS, and Raon-VisionEncoder. According to Krafton, Raon-Speech is a model that expands on a text-centric language model to enable speech understanding and generation. With 9 billion parameters, it achieved the top global performance in both English and Korean among open-source speech language models with fewer than 10 billion parameters. This ranking is based on a comprehensive evaluation across seven core tasks, such as speech-to-text conversion and speech-based Q&A, and 40 benchmarks, with the average task ranking weighted equally.


Raon-SpeechChat is the first real-time bidirectional speech model announced in Korea, allowing both the user and the model to freely interrupt each other during a conversation. In three bidirectional communication model benchmark tests, it demonstrated top global performance based on the average ranking across 13 key tasks, including backchanneling, interruption handling, and response latency. The text-to-speech conversion model Raon-OpenTTS, which was trained solely on open speech data, also achieved best-in-class performance in blind evaluations compared to global research TTS models. Krafton collected and refined some data that had previously been difficult to use and made it publicly available, and the entire training dataset can be reproduced by anyone in the same environment.


Raon-VisionEncoder converts images into information understandable by AI and, when combined with a language model, can process visual information. According to Krafton, the model was trained entirely in-house from scratch and outperformed Google’s representative vision encoder model (SigLIP2) in certain visual recognition tasks. In other tasks as well, it achieved over 90% of the performance of SigLIP2. This technology is also set to be used in Krafton's "proprietary AI foundation model" project.



Kangwook Lee, Chief AI Officer (CAIO) of Krafton, said, "The release of the Raon model series is an important milestone in accumulating AI technology capabilities. By sharing large-scale training data and core models as open source, we hope researchers and developers can use them freely, contributing to the advancement of multimodal technologies and the growth of the domestic AI ecosystem."


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing