Hanyang University announced on the 23rd that a research team led by Professor Choi Jeong-wook of the Department of Convergence Electronic Engineering has developed a knowledge distillation algorithm for ultra-low precision (ternary quantization) language model quantization-aware training.


Schematic diagram of the research results. [Photo by Hanyang University]

Schematic diagram of the research results. [Photo by Hanyang University]

View original image

Performing inference operations on generative language models requires enormous storage space and computational costs. Weight quantization compression techniques reduce the precision of the data used to store model weights while maintaining the structure of the language model. Among these, the ternary quantization method represents the model weights using only three values: -1, 0, and 1. However, this approach has the limitation of significantly degrading the original performance of the model. To address this, active research is being conducted on applying quantization-aware training to student models using knowledge distillation techniques.


The token-level knowledge distillation technique for ternary quantization-aware training was proposed based on the characteristics that generative language models exhibit under quantization, minimizing the performance degradation caused by quantization. This approach achieved higher performance improvements compared to existing knowledge distillation techniques.



The research team validated the TSLD methodology through language modeling and commonsense reasoning tasks on various generative language models. As a result, the model applied with ternary quantization recorded top performance with an accuracy drop of less than 1 compared to the floating-point model.


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing