|
1 | 1 | <h1 align="center">GPTQModel</h1>
|
2 |
| -<p align="center">GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.</p> |
| 2 | +<p align="center">Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.</p> |
3 | 3 | <p align="center">
|
4 | 4 | <a href="https://github.com/ModelCloud/GPTQModel/releases" style="text-decoration:none;"><img alt="GitHub release" src="https://img.shields.io/github/release/ModelCloud/GPTQModel.svg"></a>
|
5 | 5 | <a href="https://pypi.org/project/gptqmodel/" style="text-decoration:none;"><img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/gptqmodel"></a>
|
@@ -286,3 +286,20 @@ GPTQModel will use Marlin, Exllama v2, Triton kernels in that order for maximum
|
286 | 286 | * **Qwopqwop200**: for quantization code used in this project adapted from [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda).
|
287 | 287 | * **Turboderp**: for releasing [Exllama v1](https://github.com/turboderp/exllama) and [Exllama v2](https://github.com/turboderp/exllamav2) kernels adapted for use in this project.
|
288 | 288 | * **FpgaMiner**: for [GPTQ-Triton](https://github.com/fpgaminer/GPTQ-triton) kernels used in [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda) which is adapted into this project.
|
| 289 | + |
| 290 | +## Cite |
| 291 | +``` |
| 292 | +@article{frantar2024marlin, |
| 293 | + title={MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models}, |
| 294 | + author={Frantar, Elias and Castro, Roberto L and Chen, Jiale and Hoefler, Torsten and Alistarh, Dan}, |
| 295 | + journal={arXiv preprint arXiv:2408.11743}, |
| 296 | + year={2024} |
| 297 | +} |
| 298 | +
|
| 299 | +@article{frantar-gptq, |
| 300 | + title={{GPTQ}: Accurate Post-training Compression for Generative Pretrained Transformers}, |
| 301 | + author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh}, |
| 302 | + year={2022}, |
| 303 | + journal={arXiv preprint arXiv:2210.17323} |
| 304 | +} |
| 305 | +``` |
0 commit comments