Release GPTQModel v2.1.0 · ModelCloud/GPTQModel

What's Changed

✨ New QQQ quantization method and inference support!
✨ New Google Gemma 3 day-zero model support.
✨ New Alibaba Ovis 2 VL model support.
✨ New AMD Instella day-zero model support.
✨ New GSM8K Platinum and MMLU-Pro benchmarking suppport.
✨ Peft Lora training with GPTQModel is now 30%+ faster on all gpu and IPEX devices.
✨ Auto detect MoE modules not activated during quantization due to insufficient calibration data.
✨ ROCm setup.py compat fixes.
✨ Optimum and Peft compat fixes.
✨ Fixed Peft bfloat16 training.

auto enable flash_attn only when flash-attn was installed by @CSY-ModelCloud in #1372
Fix rocm compat by @Qubitium in #1373
fix unnecessary mkdir by @CSY-ModelCloud in #1374
add test_kernel_output_xpu.py by @CSY-ModelCloud in #1382
clean test_kernel_output_xpu.py by @CSY-ModelCloud in #1383
tremove xpu support of triton kernel by @Qubitium in #1384
[MODEL] Add instella support by @LRL-ModelCloud in #1385
Fix optimum/peft trainer integration by @CSY-ModelCloud in #1381
rename peft test file by @CSY-ModelCloud in #1387
[CI] fix wandb was not installed & update test_olora_finetuning_xpu.py by @CSY-ModelCloud in #1388
Add lm-eval GSM8k Platinum by @Qubitium in #1394
Remove cuda kernel by @Qubitium in #1396
fix exllama kernels not compiled by @Qubitium in #1397
update tests by @Qubitium in #1398
make the kernel output validation more robust by @Qubitium in #1399
speed up ci by @Qubitium in #1400
add fwd counter by @yuchiwang in #1389
allow triton and ipex to inherit torch kernel and use torch for train… by @Qubitium in #1401
fix skip moe modules when fwd count is 0 by @Qubitium in #1404
fix ipex linear post init for finetune by @jiqing-feng in #1406
fix optimum compat by @Qubitium in #1408
[Feature] Add mmlupro API by @CL-ModelCloud in #1405
add training callback by @CSY-ModelCloud in #1409
Fix bf16 training by @Qubitium in #1410
fix bf16 forward for triton by @Qubitium in #1411
Add QQQ by @Qubitium in #1402
make IPEX or any kernel that uses Torch for Training to auto switch v… by @Qubitium in #1412
[CI] xpu inference test by @CL-ModelCloud in #1380
[FIX] qqq with eora by @ZX-ModelCloud in #1415
[FIX] device error by @ZX-ModelCloud in #1417
make quant linear expose internal buffers by @Qubitium in #1418
Fix bfloat16 kernels by @Qubitium in #1420
fix qqq bfloat16 forward by @Qubitium in #1423
Fix ci10 by @Qubitium in #1424
fix marlin bf16 compat by @Qubitium in #1427
[CI] no need reinstall requirements by @CSY-ModelCloud in #1426
[FIX] dynamic save error by @ZX-ModelCloud in #1428
[FIX] super().post_init() calling order by @ZX-ModelCloud in #1431
fix bitblas choose IPEX in cuda env by @CSY-ModelCloud in #1432
Fix exllama is not packable by @Qubitium in #1433
disable exllama for training by @Qubitium in #1435
remove TritonV2QuantLinear for xpu test by @CSY-ModelCloud in #1436
[MODEL] add gemma3 support by @LRL-ModelCloud in #1434
fix the error when downloading models using modelscope by @mushenL in #1437
Add QQQ Rotation by @ZX-ModelCloud in #1425
fix no init.py by @CSY-ModelCloud in #1438
Fix hardmard import by @Qubitium in #1441
Eora final by @nbasyl in #1440
triton is not validated for ipex by @Qubitium in #1445
Fix exllama adapter by @Qubitium in #1446
fix rocm compile by @Qubitium in #1447
[FIX] Correctly obtain the submodule's device by @ZX-ModelCloud in #1448
fix rocm not compatible with exllama v2 and eora kernel by @Qubitium in #1449
revert overflow code by @Qubitium in #1450
add kernel dtype support and add full float15 vs bfloat16 kernel testing by @Qubitium in #1452
[MODEL] add Ovis2 support and bug fix by @Fusionplay in #1454
add unit test for ovis2 by @CSY-ModelCloud in #1456

New Contributors

@yuchiwang made their first contribution in #1389
@mushenL made their first contribution in #1437
@nbasyl made their first contribution in #1440
@Fusionplay made their first contribution in #1454

Full Changelog: v2.0.0...v2.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQModel v2.1.0

What's Changed

New Contributors

Contributors

Uh oh!