Developed by Professor Yeondon Jung’s Team at Korea University

AI Breaks the Bottleneck of Manual De-identification

Laying the Groundwork for Expanding Public Access and Judicial Transparency

A new technology has emerged that enables artificial intelligence (AI) to automatically remove names and resident registration numbers from court rulings, a process that was previously done manually. Professor Yeondon Jung's research team from the Department of Computer Science at Korea University has eliminated the bottleneck in the de-identification process, which relied heavily on manual work, laying the groundwork for expanding public access to court rulings and enhancing judicial transparency.

Professor Yeondon Jeong's research team from the Department of Computer Science at Korea University developed an artificial intelligence model for de-identifying personal information in court rulings. Professor Jeong (first from the left) and the researchers are taking a commemorative photo at the "EACL 2026" held last March in Rabat, Morocco. Korea University

Professor Yeondon Jeong's research team from the Department of Computer Science at Korea University developed an artificial intelligence model for de-identifying personal information in court rulings. Professor Jeong (first from the left) and the researchers are taking a commemorative photo at the "EACL 2026" held last March in Rabat, Morocco. Korea University

View original image

On April 24, Korea University announced that Professor Jung's team has developed a technology that utilizes AI to rapidly and accurately de-identify sensitive personal information within court rulings.


Article 109 of the Constitution of the Republic of Korea stipulates that the proceedings and judgments of trials must, in principle, be made public. However, in the process of disclosure, identifying information such as names, resident registration numbers, and addresses must be redacted. The challenge has been that this work has relied almost entirely on manual effort. Processing a single document took about two weeks, resulting in only 5.97% of all rulings being eligible for public disclosure.


Although the courts also attempted to introduce automated systems, the existing technology achieved only about 8% accuracy in identifying personal information, making it unsuitable for real-world application.


To address this issue, the research team developed a large-scale benchmark dataset called "K-LegalDeID" and a high-performance AI model named "KLUEBERT-CRF." By combining 2,000 rulings spanning 39 legal fields with social network service (SNS) conversation data, they constructed a training dataset of approximately 1 million cases. Based on this, the AI was designed to independently identify and remove personal information within context. Notably, the research team combined a Conditional Random Field (CRF) technique to accommodate the agglutinative nature of the Korean language and the complexity of legal sentence structures, enabling more sophisticated analysis by considering the preceding and following context of each sentence.


As a result, the model achieved a world-class "entity-level micro F1 score" of 0.9923. At the same time, the number of parameters was reduced to about 110 million, making the model 20 times lighter than existing models and ensuring practical efficiency suitable for actual court environments.


Professor Jung stated, "By building a high-quality dataset, we have addressed the chronic data shortage in the domestic legal tech field. Through the automation of de-identification in court rulings, we will be able to dramatically expand the scope of public disclosure and enhance trust in the judiciary."



This research was presented at the international conference on natural language processing, 'EACL 2026,' held in Rabat, Morocco on March 26, and has also been filed for a domestic patent.


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing