Skip to content

list0830/ResComp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Rethinking Residual Errors in Compensation-based LLM Quantization [ICLR'26]

Motivation

We identify the missing residual error term ( We name it 'Compensation-aware Error' ) in GPTAQ, which comes the discrepancy between compensated and original weights. Therefore, our method strictly aligns with the original full-precision ouput at each column.

Scripts

Our codebase is heavily relied on GPTAQ, with simple modifications. Please see fake_quant/gptaq_utils_r.py for details.

Take weight-only quantization as an example:

cd fake_quant bash weight_group_3bit.sh ### per-group quantization bash weight_group_2bit.sh ### Quarot + per-group quantization 

Todo List

I'm continuously working on improving the stability of ResComp. Since we form a more precise optimization objective, and we only use 128 samples, it may be more sensitive to the quality of calibration data. I warmly welcome further discussions, feel free to contact me (list@zju.edu.cn).

Acknowledgements

Our codebase is built heavily on previous works, and we would like to acknowledge and thank their awesome contribution:

  • GPTAQ: Efficient finetuning-free quantization for asymmetric calibration github
  • GPTQ: Accurate post-training quantization for generative pre-trained transformers github
  • QuaRot: Outlier-free 4-bit inference in rotated llms github
  • SpinQuant: Llm quantization with learned rotations github

Citation

If you find our work useful in your research, please kindly cite this paper:

@inproceedings{lirethinking, title={Rethinking Residual Errors in Compensation-based LLM Quantization}, author={Li, Shuaiting and Deng, Juncan and Xu, Kedong and Deng, Rongtao and Gu, Hong and Jiang, Minghan and Shen, Haibin and Huang, Kejie}, booktitle={The Fourteenth International Conference on Learning Representations} } 

Besides, if you are interested in vector quantization, checkout our previous papers: SSVQ [ICCV'25], MVQ [ASPLOS'25], VQ4DiT [AAAI'25], ViM-VQ [ICCV'25]. I'm also seeking collabration opportunity in CUDA kernel optimization to better support SSVQ.

About

[ICLR'26] Rethinking Residual Errors in Compensation-based LLM Quantization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors