Description
Expected Behavior
I expected finetune to produce a usable LoRA adapter for all supported models.
Current Behavior
For Mistral models (I tried both Mistral and Zephyr, Q8_0, Q5_K_M, Q5_0) model outputs gibberish with LoRA after a single finetune iteration.
On the same PC finetuning produces usable LoRA adapter for TinyLlama (I tried Q8_0, Q5_K_M, Q5_0).
First few tokens for "Building a website can be done in 10 simple steps:" prompt:
Base Mistral model:
Building a website can be done in 10 simple steps:
1. Come up with an idea for your site.
2. Do some research on the web to see what’s out there.
Mistral with LoRA (single finetune iteration on shakespeare.txt from example):
Building a website can be done in 10 simple steps: (3 . in.
A,
! (
P! A, PAM,IT A) MER W W 0
Environment and Context
- Physical (or virtual) hardware you are using, e.g. for Linux:
Core i7 4770 CPU
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU family: 6
Model: 60
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 3
CPU(s) scaling MHz: 100%
CPU max MHz: 3900.0000
CPU min MHz: 800.0000
BogoMIPS: 6784.88
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
aes xsave avx f16c rdrand lahf_lm abm cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsav
eopt dtherm ida arat pln pts
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1 MiB (4 instances)
L3: 8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Srbds: Vulnerable: No microcode
Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
- Operating System, e.g. for Linux:
$ uname -a
Linux maxxk-pc 6.1.29 #1-NixOS SMP PREEMPT_DYNAMIC Wed May 17 09:54:00 UTC 2023 x86_64 GNU/Linux
Failure Information (for bugs)
For Mistral models (I tried both Mistral and Zephyr, Q8_0, Q5_K_M, Q5_0) model outputs gibberish with LoRA after a single finetune iteration.
Steps to Reproduce
I used pre-converted models from TheBloke:
- Mistral: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/tree/main
- Zephyr: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main
This issue can be reproduced using shakespeare.txt from finetune example, but I got same results for a different dataset.
Finetuning command:
../llama.cpp/bin/finetune \
--model-base mistral-7b-v0.1.Q8_0.gguf \
--train-data shakespeare.txt \
--lora-out lora-Q8_0.gguf \
--save-every 1 \
--threads 4 \
--ctx 64 \
--batch 1 \
--grad-acc 1 \
--lora-r 64 \
--lora-alpha 64 \
--adam-iter 1 \
--use-checkpointing \
--use-flash \
--escape \
--seed 1
For Zephyr (also produces invalid LoRA) and TinyLlama (produces valid LoRA) I changed only model-base parameter. Between experiments I removed all finetune checkpoints and LoRAs.
Testing without LoRA:
../llama.cpp/bin/main -m ./mistral-7b-v0.1.Q8_0.gguf -p "Building a website can be done in 10 simple steps:"
Testing with LoRA:
../llama.cpp/bin/main -m ./mistral-7b-v0.1.Q8_0.gguf -p "Building a website can be done in 10 simple steps:" --lora ./lora-Q8_0.gguf
P.S. As a final part of this bug report I would like to thank all contributors for this amazing piece of software. It is a pleasure to use, and it gives an ability to experiment with LLMs even for those of us without top GPUs.