Skip to content

Commit 1a20cc4

Browse files
authored
Add GPTQ/Marlin paper citation (#519)
1 parent c264777 commit 1a20cc4

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

README.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<h1 align="center">GPTQModel</h1>
2-
<p align="center">GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.</p>
2+
<p align="center">Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.</p>
33
<p align="center">
44
<a href="https://github.com/ModelCloud/GPTQModel/releases" style="text-decoration:none;"><img alt="GitHub release" src="https://img.shields.io/github/release/ModelCloud/GPTQModel.svg"></a>
55
<a href="https://pypi.org/project/gptqmodel/" style="text-decoration:none;"><img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/gptqmodel"></a>
@@ -286,3 +286,20 @@ GPTQModel will use Marlin, Exllama v2, Triton kernels in that order for maximum
286286
* **Qwopqwop200**: for quantization code used in this project adapted from [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda).
287287
* **Turboderp**: for releasing [Exllama v1](https://github.com/turboderp/exllama) and [Exllama v2](https://github.com/turboderp/exllamav2) kernels adapted for use in this project.
288288
* **FpgaMiner**: for [GPTQ-Triton](https://github.com/fpgaminer/GPTQ-triton) kernels used in [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda) which is adapted into this project.
289+
290+
## Cite
291+
```
292+
@article{frantar2024marlin,
293+
title={MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models},
294+
author={Frantar, Elias and Castro, Roberto L and Chen, Jiale and Hoefler, Torsten and Alistarh, Dan},
295+
journal={arXiv preprint arXiv:2408.11743},
296+
year={2024}
297+
}
298+
299+
@article{frantar-gptq,
300+
title={{GPTQ}: Accurate Post-training Compression for Generative Pretrained Transformers},
301+
author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh},
302+
year={2022},
303+
journal={arXiv preprint arXiv:2210.17323}
304+
}
305+
```

0 commit comments

Comments
 (0)