Kakao's Self-Developed Orchestration Benchmark Accepted as Paper at Top 3 AI Conference

by Lee Myeonghwan

Published 03 Feb.2026 10:35(KST)

Paper Accepted at ICLR 2026, a Leading International AI Conference
Released as Open Source on GitHub

Kakao announced on February 3 that its 'Orchestration Benchmark,' an evaluation metric for the orchestration capability of its self-developed artificial intelligence (AI) model, has been accepted as a paper at the International Conference on Learning Representations (ICLR) 2026, one of the world's top three AI conferences.

According to Kakao, ICLR is considered one of the premier academic conferences in the AI field. For ICLR 2026, approximately 19,000 papers were submitted, and only the top 28% were accepted, highlighting the fierce competition.

Kakao announced on the 3rd that its 'Orchestration Benchmark,' an evaluation metric for the orchestration capability of its self-developed artificial intelligence (AI) model, has been accepted as a paper at the International Conference on Learning Representations (ICLR) 2026, one of the world's top three AI conferences. Photo by Kakao

Kakao developed this benchmark in response to the latest technological trend in which large language models (LLMs) are evolving from simple question-and-answer systems to agentic AI that can plan and coordinate a variety of tasks requested by users. The benchmark focuses on evaluating multi-step and multi-domain processing capabilities in real-world service environments.

Existing LLM benchmarks have only enabled fragmented performance assessments, making it difficult to evaluate orchestration capabilities such as invoking appropriate tools and managing dependencies between tasks. To address this, Kakao introduced a new framework that separates the evaluation of AI models' task planning and tool execution based on actual service scenarios.

This orchestration benchmark consists of 17 service domains, including travel, shopping, finance, and scheduling, and incorporates 100 virtual tools. Scenarios were built based on real service dialogue flows, such as changes in user requests or follow-up questions. All evaluation data was constructed and verified manually by AI annotators and supports both Korean and English. For Korean, cultural characteristics and context were reflected to enhance the accuracy and reliability of the evaluation.

To promote the global AI ecosystem and the free use of research and development outcomes, Kakao has released the paper on the open-source platform GitHub.

Hot Picks Today

“Nothing Left to Protect” as Japan Drops Its ...

A Kakao representative stated, "The orchestration benchmark is a core tool for assessing whether agentic AI can operate reliably in real service environments. This is a meaningful research achievement at a time when agentic AI is evolving to make decisions and execute tasks autonomously. We will continue to conduct research to strengthen collaboration and orchestration capabilities among agents and will advance technologies that align with Kakao's direction for agentic AI."

한글 기사 보기

This content was produced with the assistance of AI translation services.