"Understanding Depth in Photos"... KENTECH Develops Prompt-Based Reasoning Technology

by Min Chanki

Published 01 Oct.2025 10:09(KST)

Professor Lee Seokju's Lightweight Prompt Learning Blueprint.

Korea Institute of Energy Technology (KENTECH, Acting President Park Jin-ho) announced on October 1 that Professor Lee Seokju's research team has developed a lightweight prompt learning technology that enables three-dimensional spatial reasoning in Vision Language Models (VLMs). The research team applied this technology to monocular camera-based depth estimation methods, significantly improving artificial intelligence's spatial understanding capabilities.

The multimodal Vision Language Model CLIP is an artificial intelligence system that simultaneously understands images and text, and is widely used in the convergence field of vision and natural language processing. For example, when shown the word "cat," it can identify cats among countless photos. However, it has had limitations in geometric spatial understanding tasks such as distance and depth perception.

To overcome this, the research team introduced a non-human language prompt, a new expression method optimized for machine comprehension instead of human language. This allows precise identification of object depth using only photos or videos captured by a camera.

Experimental results showed that the new technology achieved performance comparable to existing large-scale models (with over 300 million parameters) using only about 1.1 million training parameters. The number of required parameters was reduced to about one three-hundredth, yet effective learning was possible without any loss in performance. Professor Lee Seokju emphasized, "This will establish itself as a core fundamental technology applicable to a variety of spatial computing fields where lightweight solutions are essential, such as autonomous driving, robot vision, and augmented reality."

Hot Picks Today

"Stocks Are Not Taxed, but Annual Crypto Gains Over 2.5 Million Won to Be Taxed Next Year... Investors Push Back"

This research was supported by the Ministry of Trade, Industry and Energy, the National Research Foundation of Korea, and the Korea Astronomy and Space Science Institute. It was published online in the world-renowned journal Pattern Recognition, specializing in computer vision and machine learning, on September 26.

한글 기사 보기

This content was produced with the assistance of AI translation services.

"Understanding Depth in Photos"... KENTECH Develops Prompt-Based Reasoning Technology

Hot Picks Today

Today’s Briefing