[KERNEL] AllSpark + Exllama vLLM

Let's port these 2 kernels over to GPTQModel as well for simple inference. 

* AllSpark by @wyajieha: `The difference is that allspark kernel only supports bits=8, group_size=-1, and desc_act=False. ` https://github.com/vllm-project/vllm/pull/12931

* Exllama vLLM: Very different, structually from the existing v1/v2. We need to benchmark and validate accuracy . If this kernel is good, let's retire Exllama v1/v2. 

HF Transformers uses GPTQModel kernels for GPTQ models so this will benefit all HF transformer api/loading. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KERNEL] AllSpark + Exllama vLLM #1359

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[KERNEL] AllSpark + Exllama vLLM #1359

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions