GPTQModel v2.1.0
What's Changed
✨ New QQQ quantization method and inference support!
✨ New Google Gemma 3
day-zero model support.
✨ New Alibaba Ovis 2
VL model support.
✨ New AMD Instella
day-zero model support.
✨ New GSM8K Platinum
and MMLU-Pro
benchmarking suppport.
✨ Peft Lora training with GPTQModel is now 30%+ faster on all gpu and IPEX devices.
✨ Auto detect MoE modules not activated during quantization due to insufficient calibration data.
✨ ROCm
setup.py compat fixes.
✨ Optimum and Peft compat fixes.
✨ Fixed Peft bfloat16 training.
- auto enable flash_attn only when flash-attn was installed by @CSY-ModelCloud in #1372
- Fix rocm compat by @Qubitium in #1373
- fix unnecessary mkdir by @CSY-ModelCloud in #1374
- add test_kernel_output_xpu.py by @CSY-ModelCloud in #1382
- clean test_kernel_output_xpu.py by @CSY-ModelCloud in #1383
- tremove xpu support of triton kernel by @Qubitium in #1384
- [MODEL] Add instella support by @LRL-ModelCloud in #1385
- Fix optimum/peft trainer integration by @CSY-ModelCloud in #1381
- rename peft test file by @CSY-ModelCloud in #1387
- [CI] fix wandb was not installed & update test_olora_finetuning_xpu.py by @CSY-ModelCloud in #1388
- Add lm-eval
GSM8k Platinum
by @Qubitium in #1394 - Remove cuda kernel by @Qubitium in #1396
- fix exllama kernels not compiled by @Qubitium in #1397
- update tests by @Qubitium in #1398
- make the kernel output validation more robust by @Qubitium in #1399
- speed up ci by @Qubitium in #1400
- add fwd counter by @yuchiwang in #1389
- allow triton and ipex to inherit torch kernel and use torch for train… by @Qubitium in #1401
- fix skip moe modules when fwd count is 0 by @Qubitium in #1404
- fix ipex linear post init for finetune by @jiqing-feng in #1406
- fix optimum compat by @Qubitium in #1408
- [Feature] Add mmlupro API by @CL-ModelCloud in #1405
- add training callback by @CSY-ModelCloud in #1409
- Fix bf16 training by @Qubitium in #1410
- fix bf16 forward for triton by @Qubitium in #1411
- Add QQQ by @Qubitium in #1402
- make IPEX or any kernel that uses Torch for Training to auto switch v… by @Qubitium in #1412
- [CI] xpu inference test by @CL-ModelCloud in #1380
- [FIX] qqq with eora by @ZX-ModelCloud in #1415
- [FIX] device error by @ZX-ModelCloud in #1417
- make quant linear expose internal buffers by @Qubitium in #1418
- Fix bfloat16 kernels by @Qubitium in #1420
- fix qqq bfloat16 forward by @Qubitium in #1423
- Fix ci10 by @Qubitium in #1424
- fix marlin bf16 compat by @Qubitium in #1427
- [CI] no need reinstall requirements by @CSY-ModelCloud in #1426
- [FIX] dynamic save error by @ZX-ModelCloud in #1428
- [FIX] super().post_init() calling order by @ZX-ModelCloud in #1431
- fix bitblas choose IPEX in cuda env by @CSY-ModelCloud in #1432
- Fix exllama is not packable by @Qubitium in #1433
- disable exllama for training by @Qubitium in #1435
- remove TritonV2QuantLinear for xpu test by @CSY-ModelCloud in #1436
- [MODEL] add gemma3 support by @LRL-ModelCloud in #1434
- fix the error when downloading models using modelscope by @mushenL in #1437
- Add QQQ Rotation by @ZX-ModelCloud in #1425
- fix no init.py by @CSY-ModelCloud in #1438
- Fix hardmard import by @Qubitium in #1441
- Eora final by @nbasyl in #1440
- triton is not validated for ipex by @Qubitium in #1445
- Fix exllama adapter by @Qubitium in #1446
- fix rocm compile by @Qubitium in #1447
- [FIX] Correctly obtain the submodule's device by @ZX-ModelCloud in #1448
- fix rocm not compatible with exllama v2 and eora kernel by @Qubitium in #1449
- revert overflow code by @Qubitium in #1450
- add kernel dtype support and add full float15 vs bfloat16 kernel testing by @Qubitium in #1452
- [MODEL] add Ovis2 support and bug fix by @Fusionplay in #1454
- add unit test for ovis2 by @CSY-ModelCloud in #1456
New Contributors
- @yuchiwang made their first contribution in #1389
- @mushenL made their first contribution in #1437
- @nbasyl made their first contribution in #1440
- @Fusionplay made their first contribution in #1454
Full Changelog: v2.0.0...v2.1.0