Hanyang University Develops 'Quantized Perception Knowledge Distillation Learning Technology'
Hanyang University announced on the 23rd that a research team led by Professor Choi Jeong-wook of the Department of Convergence Electronic Engineering has developed a knowledge distillation algorithm for ultra-low precision (ternary quantization) language model quantization-aware training.
Performing inference operations on generative language models requires enormous storage space and computational costs. Weight quantization compression techniques reduce the precision of the data used to store model weights while maintaining the structure of the language model. Among these, the ternary quantization method represents the model weights using only three values: -1, 0, and 1. However, this approach has the limitation of significantly degrading the original performance of the model. To address this, active research is being conducted on applying quantization-aware training to student models using knowledge distillation techniques.
The token-level knowledge distillation technique for ternary quantization-aware training was proposed based on the characteristics that generative language models exhibit under quantization, minimizing the performance degradation caused by quantization. This approach achieved higher performance improvements compared to existing knowledge distillation techniques.
Hot Picks Today
[Breaking] "Management to Defer Allocation Method for Deficit Business Units by One Year"
- "It Has Now Crossed Borders": No Vaccine or Treatment as Bundibugyo Ebola Variant Spreads [Reading Science]
- [Tomorrow's Weather] Heavy Rain Nationwide... Expected to Stop in the Afternoon
- "Stocks Are Not Taxed, but Annual Crypto Gains Over 2.5 Million Won to Be Taxed Next Year... Investors Push Back"
- "Who Is Visiting Japan These Days?" The Once-Crowded Tourist Spots Empty Out... What's Happening?
The research team validated the TSLD methodology through language modeling and commonsense reasoning tasks on various generative language models. As a result, the model applied with ternary quantization recorded top performance with an accuracy drop of less than 1 compared to the floating-point model.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.