Transforming Free-Text Test Results into Standardized Data Using LLMs

The National Institute of Health under the Korea Disease Control and Prevention Agency announced on the 11th that it has verified a technology that automatically converts coronary angiography examination records, written in free-text form, into standardized data using artificial intelligence (AI).


Technical overview diagram of AI-based automatic structuring technology for cardiovascular examination records. Korea Disease Control and Prevention Agency.

Technical overview diagram of AI-based automatic structuring technology for cardiovascular examination records. Korea Disease Control and Prevention Agency.

View original image

This study, conducted by a joint research team from Yonsei University College of Medicine and the National Health Insurance Service Ilsan Hospital with support from the National Institute of Health, focuses on using a large language model (LLM) to transform examination records freely written by medical staff into structured data that can be directly used for analysis.


Although coronary angiography reports contain key information for the diagnosis and treatment of cardiovascular diseases, they are mostly written in unstructured narrative form, making them difficult to use for large-scale research or policy analysis. As a result, cardiologists have faced the limitation of having to read thousands of coronary angiography records themselves and manually organize the necessary information.


To address this, the research team developed an automatic structuring technology using LLMs such as ChatGPT and Gemini. In the first step, an LLM is used to convert narrative-style reports into a standardized structure designed by cardiology specialists. In the second step, based on the structured data, a rule-based algorithm is applied to automatically extract 12 key clinical indicators, including lesion location, stent information, and whether complex procedures were performed.


Through this process, coronary angiography reports that were previously recorded as free text are automatically organized into table-type data that can be analyzed immediately.


Verification of the accuracy of the automatically structured data showed high accuracy of 96% to 99% in major items. In some indicators, it even outperformed manual work by specialists, demonstrating its potential for use in clinical research.


The results of this study have been published in the international journal Scientific Reports.


Nam Jae-hwan, Director of the National Institute of Health, assessed, "This achievement is expected to significantly facilitate the use of medical data in large-scale epidemiological studies and in the selection of clinical trial participants related to cardiovascular diseases. In particular, by linking it with the sex-based medicine research being conducted at the National Institute of Health, it will help strengthen the research infrastructure for cardiovascular diseases that takes sex-specific characteristics into account."



Lim Seung-kwan, Commissioner of the Korea Disease Control and Prevention Agency, said, "By automating the curation of medical records, the efficiency of research and policy making will be greatly enhanced."


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing