AI Startup VIDRAFT Unveils MARL Inference Middleware to Reduce Hallucinations, Launches Globally

by Lee Sanghyun

Published 11 Mar.2026 16:35(KST)

AI Startup VIDRAFT Unveils MARL Inference Middleware to Reduce Hallucinations, Launches Globally

VIDRAFT, an AI startup based at Seoul AI Hub, has unveiled "MARL (Model-Agnostic Runtime Middleware for LLMs)," an AI inference middleware designed to reduce hallucinations in large language models (LLMs) and enhance their self-correction capabilities. This technology was released simultaneously through Hugging Face, GitHub, PyPI, and the skill marketplace "ClawHub" of OpenClaw, an AI agent platform with a developer community of more than 260,000 members.

MARL is middleware that automatically runs a multi-stage self-verification pipeline within the runtime environment to reduce LLM hallucinations, without requiring model retraining (fine-tuning). With just a single line of code modification, it can be immediately applied to most LLMs supporting the OpenAI API format, including GPT-5.4, Claude, Gemini, DeepSeek, Grok, and Llama. The core inference engine is protected in compiled binary form, while interfaces necessary for testing and integration have been released.

Metacognition refers to AI’s ability to recognize the possibility that its answers may be incorrect and to self-correct. However, current LLMs have two structural limitations.

The first is the metacognition gap. In VIDRAFT’s AI metacognition benchmark "FINAL Bench," a total of 1,800 cases were evaluated across nine leading models, including GPT-5.2, Claude Opus 4.6, and Gemini 3 Pro. The results revealed a significant gap between the "ability to recognize the possibility of being wrong (MA=0.694)" and the "actual ability to correct errors (ER=0.302)."

The second is the structural limitation of self-negation. Because LLMs operate based on an autoregressive structure, it is difficult to change direction once answer generation has started. If the initial assumption is incorrect, the model continues reasoning in that direction, resulting in hallucinations produced with high confidence.

To address these issues, MARL applies a structure that divides a single LLM call into multiple expert roles. A planner designs the approach, an executor drafts the answer, a verifier points out errors, and an integrator incorporates feedback to generate a new final answer. Only the refined final result is provided to the user, and the multi-stage inference process is recorded step-by-step, allowing for review if needed.

This structure draws out new perspectives through multi-stage interactions that do not appear in a single model, and generates new final answers after adversarial verification steps review the draft’s errors. In the FINAL Bench study, applying the metacognitive structure improved performance by over 70% on the most difficult problems, with 94.8% of this effect attributed to enhanced self-correction capabilities.

In addition to basic inference enhancement, MARL includes specialized emergent engines for nine domains, such as drug discovery, creative work, and legal fields, making it useful for domain-specific idea generation.

Typically, fine-tuning requires costs of tens of millions of won and takes several weeks, and Retrieval-Augmented Generation (RAG) focuses on supplementing external knowledge, making it difficult to correct inference errors themselves. In contrast, MARL operates by altering the inference structure at runtime without changing model weights. Even if the model is changed, the MARL layer can be retained, providing a consistent quality control tool for companies employing multi-LLM strategies.

Additionally, MARL has been officially registered on ClawHub, the marketplace of the AI agent platform OpenClaw, which has surpassed 260,000 GitHub stars. As the first middleware in the "Inference Enhancement" category among over 3,200 skills, it provides a structure that helps AI agents think thoroughly before taking action. While OpenClaw agents take on execution-focused roles, MARL is described as serving the "brain" role, enhancing metacognition.

A VIDRAFT spokesperson stated, "Through the FINAL Bench study, we confirmed that even the most advanced AIs significantly lack the 'ability to recognize what they do not know.' MARL is not about changing the model, but about changing the very way AI thinks. We believe that transparently showing the process of self-doubt and correction is the starting point for trust."

VIDRAFT plans to launch the Enterprise Edition in the first half of 2026 and submit the results of effectiveness verification based on FINAL Bench to international academic journals. The company has also completed a PoC with a local company for entry into the U.S. market and is currently working on localization.

Meanwhile, VIDRAFT is a startup founded in 2024 and based at Seoul AI Hub, with the goal of developing True-AGI by 2030. The company developed the AI metacognition benchmark "FINAL Bench," which has ranked fifth globally in Hugging Face datasets and was selected for the leaderboard "Space of the Week." Recently, VIDRAFT also signed a joint R&D agreement with global broadcast monitor manufacturer Wearcube.

Hot Picks Today

If They Fail Next Year, Bonus Drops to 97 Million Won... A Closer Look at Samsung Electronics DS Division’s 600M vs 460M vs 160M Performance Bonuses

Additionally, VIDRAFT has achieved various milestones, including ranking second globally (CNRS validation) in the medical AI field on Google DeepMind's FACTS Grounding Leaderboard, fourth globally on Hugging Face's heatmap leaderboard, and being named to the STAR AI TOP 12 (2024, the only Korean company). Currently, the company has 2 million monthly active users, 30 million cumulative visitors, and more than 1,500 public AI models available.

한글 기사 보기

This content was produced with the assistance of AI translation services.

AI Startup VIDRAFT Unveils MARL Inference Middleware to Reduce Hallucinations, Launches Globally

Hot Picks Today

Today’s Briefing