Open
Description
Let's port these 2 kernels over to GPTQModel as well for simple inference.
-
AllSpark by @wyajieha:
The difference is that allspark kernel only supports bits=8, group_size=-1, and desc_act=False.
[Misc][Kernel]: Add GPTQAllSpark Quantization vllm-project/vllm#12931 -
Exllama vLLM: Very different, structually from the existing v1/v2. We need to benchmark and validate accuracy . If this kernel is good, let's retire Exllama v1/v2.
HF Transformers uses GPTQModel kernels for GPTQ models so this will benefit all HF transformer api/loading.
Metadata
Metadata
Assignees
Labels
No labels