Samsung Electronics Unveils AI Work Performance Metric "TRUEBench"
2,485 Detailed Items Reflecting Real-World Work Environments
Simultaneous Comparison of Up to Five Models, Support for 12 Languages
Data Samples and Leaderboard Released on Global Open-Source Platform
On September 25, Samsung Electronics unveiled a new evaluation metric, TRUEBench, which quantifies how much artificial intelligence (AI) actually helps in real-world work environments.
TRUEBench is an evaluation standard developed by Samsung Research, the advanced research and development organization of Samsung Electronics' DX Division, based on in-house experience with generative AI. While previous evaluations focused on simple Q&A tasks primarily in English, TRUEBench is designed to assess a wide range of practical office tasks, including document summarization and translation, data analysis, and continuous conversation. It consists of 10 categories, 46 work types, and 2,485 detailed items.
Samsung Electronics explained that up to five models can be compared at once, and not only the accuracy of answers but also their length and efficiency are released, enabling more granular evaluation. The supported languages include English, Korean, Japanese, Chinese, Spanish, and a total of 12 languages, allowing users to check cross-translation performance across different language combinations.
Additionally, Samsung Electronics released TRUEBench data samples and a leaderboard on the global open-source platform Hugging Face. The evaluation process adopts a cross-validation method in which AI re-examines human-created standards, thereby enhancing objectivity and consistency.
Hot Picks Today
"Rather Than Endure a 1.5 Million KRW Stipend, I'd Rather Earn 500 Million in the U.S." Top Talent from SNU and KAIST Are Leaving [Scientists Are Disappearing] ①
- "Most Americans Didn't Want This"... Americans Lose 60 Trillion Won to Soaring Fuel Costs
- "It's Only May, but Convenience Stores Know... Iced Americano at 24°C, Tube Ice Cream at 31°C: The Thermometer of the Summer Sales Boom"
- Samyang Foods' Third-Generation Owner Bets on Wellness, Accelerates Search for Next Buldak
- "I Hated Myself as Much as I Craved It"... Even a Mother's Tears and Brilliant Dreams Were Shattered [ChwiYakGukga] ⑦
Jeon Kyunghoon, Head of Samsung Research and CTO of the DX Division, stated, "Based on a variety of real-world applications, we have secured differentiated productivity AI technology and expertise. By releasing TRUEBench, we aim to set a global productivity evaluation standard and strengthen our technological leadership."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.