KENTECH Develops Prompt Learning Technique to Enhance Vision-Language Model Recognition

by Min Chanki

Published 13 Nov.2025 10:31(KST)

(From left) Lee Seokju, Professor at Korea University of Energy and Technology, Sohee Kim, Researcher, Gahyun Kim, Researcher.

Korea University of Energy and Technology (KENTECH) announced on November 13 that Professor Lee Seokju’s research team has developed a new prompt learning technique designed to ensure that vision-language models (VLMs) learn only the essential attributes of images by analyzing visual distortions that can occur during the process of data diversification.

Existing vision-language models such as CLIP can understand the meaning of objects by combining images and text, but they have limitations in situations that require fine-grained attribute distinction.

To address this issue, the research team introduced the "Delta Meta Token." This token helps the model learn the relative changes between images, enabling it to distinguish attribute differences more precisely. As a result, the model becomes less sensitive to data variations while learning only the meaningful attributes for each class, allowing it to differentiate visually similar objects based on attribute cues. For example, previous methods would only learn the class information "dog," whereas the proposed technique learns common attributes such as ears, eyes, and fur. This enables the model to recognize essential attributes even when the type of dog or background changes, maintaining stable performance across datasets from different domains.

Conceptual diagram of the prompt learning technique developed by the KENTECH research team.

Despite its lightweight structure, which adds only minimal parameters to a pre-trained CLIP model, the proposed technique outperformed existing prompt learning methods on 11 benchmark datasets and demonstrated high generalization performance. It maintains robust recognition even when presented with new classes or domains, and is expected to be applied in various fields that require attribute-based visual recognition, such as autonomous driving, robot vision, and industrial video anomaly detection.

Gahyun Kim, co-first author and researcher, stated, "The autonomous and collaborative research environment at the KENTECH lab greatly helped foster a spirit of actively pursuing new ideas," adding, "We plan to explore the potential for expanding autonomous intelligent robots through future research on AI agent collaboration."

Hot Picks Today

"It Has Now Crossed Borders": No Vaccine or Treatment as Bundibugyo Ebola Variant Spreads [Reading Science]

This research was supported by the Ministry of Trade, Industry and Energy and the National Research Foundation of Korea, and was published online on October 23 in the internationally renowned journal Pattern Recognition (Elsevier), which focuses on computer vision and machine learning.

한글 기사 보기

This content was produced with the assistance of AI translation services.

KENTECH Develops Prompt Learning Technique to Enhance Vision-Language Model Recognition

Hot Picks Today

Today’s Briefing