“Excelling Without Experience”... UNIST Develops New AI Reinforcement Learning Technique

Published 21 Apr.2025 10:11(KST)

Updated 31 Jul.2025 14:23(KST)

open/close

Development of a Reinforcement Learning Technique That Maintains Stable Performance Even in Unexpected Situations
Applicable to Robotics and Autonomous Driving... Selected for Oral Presentation at ICLR, One of the Top Three AI Conferences

An artificial intelligence technology capable of making correct decisions on its own, even in unfamiliar situations, has been developed.

This advancement is expected to enhance the safety and reliability of AI in scenarios such as autonomous vehicles encountering snowy roads or robots facing sudden changes in the weight of boxes they are transporting.

The research team led by Professor Sunghwan Yoon at the UNIST Graduate School of Artificial Intelligence has developed a reinforcement learning technique that operates stably without performance degradation, even under changing environmental conditions. Their work was selected for an oral presentation at ICLR, one of the world’s top three AI conferences. Out of a total of 11,672 submitted papers, only 207 (less than 2%) were given the opportunity for oral presentation.

Research team (Researcher Hyungyu Lee (left) and Professor Sunghwan Yoon). Provided by UNIST

Unlike supervised learning, which provides the correct answers in advance, reinforcement learning is a more human-like learning method in which AI seeks a problem-solving strategy, or ‘policy’, by maximizing rewards obtained through trial and error. However, conventional reinforcement learning suffers from a significant drop in performance when faced with unfamiliar environments that it has not been trained on.

The research team proposed a learning method that reduces the sensitivity of cumulative rewards. This strategy flattens the surface of cumulative rewards in the policy parameter space, ensuring that changes in cumulative reward values due to behavioral adjustments do not become excessively large.

With conventional methods, even a slight delay in deceleration by an autonomous vehicle on a snowy road could lead to a significant loss in rewards, causing overall performance to collapse. In contrast, the proposed method maintains consistent performance even when the policy changes slightly.

When physical factors such as friction conditions or weight were altered in actual robots, the newly proposed learning technique demonstrated a high level of stability and robustness, maintaining an average reward retention rate of 80-90%. In comparison, existing learning methods showed limitations in maintaining performance, with average rewards dropping to less than half under the same conditions.

Hyungyu Lee, the first author and researcher, explained, “To reduce the sensitivity of cumulative rewards in the reinforcement learning parameter space, we adopted the SAM (Sharpness-Aware Minimization) technique from the field of supervised learning to develop this method. It is both effective and easy to implement.”

In supervised learning, AI models are trained based on a ‘loss function’ that measures how far predictions deviate from the correct answers. SAM seeks flat minima, or regions where the loss does not spike abruptly. The research team extended this concept to reinforcement learning, adjusting the learning trajectory so that cumulative rewards do not change drastically instead of focusing on loss.

Professor Sunghwan Yoon expressed his expectations, stating, “This will be applied to robotics, autonomous driving, and other fields that require reinforcement learning models with a high level of generalization performance.”

3D Visualization Graph of the Cumulative Reward Function.

ICLR (International Conference on Learning Representations) is considered one of the world’s top three AI conferences, along with ICML and NeurIPS. ICLR 2025 will be held in Singapore from April 24 to 28, and out of 11,672 papers submitted worldwide, 3,646 were accepted.

This research was supported by the Institute of Information & Communications Technology Planning & Evaluation, the National Research Foundation of Korea, and UNIST.