KENTECH Develops Prompt Learning Technique to Enhance Vision-Language Model Recognition
(From left) Lee Seokju, Professor at Korea University of Energy and Technology, Sohee Kim, Researcher, Gahyun Kim, Researcher.
View original imageKorea University of Energy and Technology (KENTECH) announced on November 13 that Professor Lee Seokju’s research team has developed a new prompt learning technique designed to ensure that vision-language models (VLMs) learn only the essential attributes of images by analyzing visual distortions that can occur during the process of data diversification.
Existing vision-language models such as CLIP can understand the meaning of objects by combining images and text, but they have limitations in situations that require fine-grained attribute distinction.
To address this issue, the research team introduced the "Delta Meta Token." This token helps the model learn the relative changes between images, enabling it to distinguish attribute differences more precisely. As a result, the model becomes less sensitive to data variations while learning only the meaningful attributes for each class, allowing it to differentiate visually similar objects based on attribute cues. For example, previous methods would only learn the class information "dog," whereas the proposed technique learns common attributes such as ears, eyes, and fur. This enables the model to recognize essential attributes even when the type of dog or background changes, maintaining stable performance across datasets from different domains.
Conceptual diagram of the prompt learning technique developed by the KENTECH research team.
View original imageDespite its lightweight structure, which adds only minimal parameters to a pre-trained CLIP model, the proposed technique outperformed existing prompt learning methods on 11 benchmark datasets and demonstrated high generalization performance. It maintains robust recognition even when presented with new classes or domains, and is expected to be applied in various fields that require attribute-based visual recognition, such as autonomous driving, robot vision, and industrial video anomaly detection.
Gahyun Kim, co-first author and researcher, stated, "The autonomous and collaborative research environment at the KENTECH lab greatly helped foster a spirit of actively pursuing new ideas," adding, "We plan to explore the potential for expanding autonomous intelligent robots through future research on AI agent collaboration."
Hot Picks Today
"It Has Now Crossed Borders": No Vaccine or Treatment as Bundibugyo Ebola Variant Spreads [Reading Science]
- "Stocks Are Not Taxed, but Annual Crypto Gains Over 2.5 Million Won to Be Taxed Next Year... Investors Push Back"
- [Breaking] Samsung Union "General Strike Suspended...Tentative Agreement to Be Put to Vote"
- "Am I Really in the Top 30%?" and "Worried About My Girlfriend in the Bottom 70%"... Buzz Over High Oil Price Relief Fund
- "Who Is Visiting Japan These Days?" The Once-Crowded Tourist Spots Empty Out... What's Happening?
This research was supported by the Ministry of Trade, Industry and Energy and the National Research Foundation of Korea, and was published online on October 23 in the internationally renowned journal Pattern Recognition (Elsevier), which focuses on computer vision and machine learning.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.