Closed
Description
I used the convert-bloom-hf-to-gguf.py
file to convert the Huggingface bigscience/bloom-7b1
to a ggml model with f16
successfully:
python convert-bloom-hf-to-gguf.py models/bloom-7b1/ 1
This gives me a model ggml-model-f16.gguf
that correctly loads and run in CPU. However, when I try to offload a layer on the GPU, I get the following error:
GGML_ASSERT: /llama.cpp/ggml-cuda.cu:6115: false
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
- Physical (or virtual) hardware you are using, e.g. for Linux:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
Stepping: 1
CPU MHz: 1197.469
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4190.27
Virtualization: VT-x
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 8 MiB
L3 cache: 80 MiB
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp l
m constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fa
ult epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle
avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln p
ts md_clear flush_l1d
- Operating System, e.g. for Linux:
Linux nemo 5.4.0-165-generic #182-Ubuntu SMP Mon Oct 2 19:43:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- SDK version, e.g. for Linux:
Python 3.10.13
GNU Make 4.2.1
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- Clone bloom7b1 from Huggingface (https://huggingface.co/bigscience/bloom-7b1)
- Use the
convert-bloom-hf-to-gguf.py
to convert to f16 ggml. - Try to load the model on GPU:
./build/bin/main -m models/bloom-7b1/ggml-model-f16.gguf -n 256 -b 512 -c 512 -f ../prompt.txt --threads 32 --temp 0.1 --top-p 0.75 --top-k 40 -cb -ngl 33
Failure Logs
./build/bin/main -m models/bloom-7b1/ggml-model-f16.gguf -n 256 -b 512 -c 512 -f ../prompt.txt --threads 32 --temp 0.1 --top-p 0.75 --top-k 40 -cb -ngl 33
Log start
main: build = 1399 (004797f)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed = 1697803662
ggml_init_cublas: found 6 CUDA devices:
Device 0: NVIDIA TITAN RTX, compute capability 7.5
Device 1: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5
Device 2: NVIDIA TITAN Xp, compute capability 6.1
Device 3: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1
Device 4: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1
Device 5: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1
llama_model_loader: loaded meta data with 19 key-value pairs and 366 tensors from models/bloom-7b1/ggml-model-f16.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 250880, 1, 1 ]
llama_model_loader: - tensor 1: output.weight f16 [ 4096, 250880, 1, 1 ]
llama_model_loader: - tensor 2: token_embd_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 3: token_embd_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 12: blk.0.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 13: blk.0.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 14: blk.0.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 15: blk.0.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 20: blk.1.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 21: blk.1.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 22: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 23: blk.1.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 24: blk.1.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 25: blk.1.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.1.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 27: blk.1.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.2.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 30: blk.2.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 31: blk.2.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 32: blk.2.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 33: blk.2.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 34: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 35: blk.2.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 36: blk.2.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 37: blk.2.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.2.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 39: blk.2.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 40: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 41: blk.3.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 42: blk.3.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 43: blk.3.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 44: blk.3.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 45: blk.3.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 46: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 47: blk.3.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 48: blk.3.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 49: blk.3.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 50: blk.3.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 51: blk.3.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 52: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 53: blk.4.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.4.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 55: blk.4.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 56: blk.4.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 57: blk.4.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 58: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 59: blk.4.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 60: blk.4.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 61: blk.4.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 62: blk.4.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 63: blk.4.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 65: blk.5.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 66: blk.5.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 67: blk.5.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 68: blk.5.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 69: blk.5.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 70: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 71: blk.5.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 72: blk.5.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 73: blk.5.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 74: blk.5.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 75: blk.5.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 76: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 77: blk.6.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 78: blk.6.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 79: blk.6.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 80: blk.6.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 81: blk.6.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 83: blk.6.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 84: blk.6.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 85: blk.6.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 86: blk.6.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 87: blk.6.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 88: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 89: blk.7.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 90: blk.7.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 91: blk.7.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 92: blk.7.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 93: blk.7.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 94: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 95: blk.7.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 96: blk.7.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 97: blk.7.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 98: blk.7.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 99: blk.7.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 101: blk.8.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 102: blk.8.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 103: blk.8.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 104: blk.8.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 105: blk.8.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 106: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 107: blk.8.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.8.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 109: blk.8.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 110: blk.8.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 111: blk.8.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 112: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 113: blk.9.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 114: blk.9.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 115: blk.9.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 116: blk.9.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 117: blk.9.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 119: blk.9.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 120: blk.9.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 121: blk.9.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 122: blk.9.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 123: blk.9.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 124: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 125: blk.10.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.10.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 127: blk.10.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 128: blk.10.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 129: blk.10.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 130: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 131: blk.10.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 132: blk.10.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 133: blk.10.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.10.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 135: blk.10.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 137: blk.11.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 138: blk.11.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 139: blk.11.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 140: blk.11.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 141: blk.11.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 142: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 143: blk.11.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.11.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 145: blk.11.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.11.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 147: blk.11.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 148: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 149: blk.12.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 150: blk.12.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 151: blk.12.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 152: blk.12.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 153: blk.12.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 154: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 155: blk.12.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 156: blk.12.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 157: blk.12.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 158: blk.12.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 159: blk.12.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 160: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 161: blk.13.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 162: blk.13.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 163: blk.13.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 164: blk.13.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 165: blk.13.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 166: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 167: blk.13.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 168: blk.13.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 169: blk.13.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 170: blk.13.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 171: blk.13.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.14.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 174: blk.14.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 175: blk.14.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 176: blk.14.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 177: blk.14.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 178: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 179: blk.14.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 180: blk.14.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 181: blk.14.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.14.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 183: blk.14.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 184: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 185: blk.15.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 186: blk.15.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 187: blk.15.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 188: blk.15.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 189: blk.15.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 190: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.15.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 192: blk.15.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 193: blk.15.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 194: blk.15.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 195: blk.15.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 196: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 197: blk.16.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.16.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 199: blk.16.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 200: blk.16.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 201: blk.16.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 202: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 203: blk.16.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 204: blk.16.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 205: blk.16.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 206: blk.16.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 207: blk.16.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 209: blk.17.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 210: blk.17.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 211: blk.17.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 212: blk.17.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 213: blk.17.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 214: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 215: blk.17.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 216: blk.17.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 217: blk.17.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.17.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 219: blk.17.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 220: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 221: blk.18.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 222: blk.18.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 223: blk.18.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 224: blk.18.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 225: blk.18.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 226: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 227: blk.18.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 228: blk.18.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 229: blk.18.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 230: blk.18.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 231: blk.18.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 232: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 233: blk.19.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 234: blk.19.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 235: blk.19.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.19.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 237: blk.19.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 238: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 239: blk.19.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 240: blk.19.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 241: blk.19.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 242: blk.19.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 243: blk.19.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 245: blk.20.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 246: blk.20.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 247: blk.20.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 248: blk.20.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 249: blk.20.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 250: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 251: blk.20.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 252: blk.20.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 253: blk.20.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.20.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 255: blk.20.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 256: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 257: blk.21.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 258: blk.21.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 259: blk.21.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 260: blk.21.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 261: blk.21.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 263: blk.21.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 264: blk.21.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 265: blk.21.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 266: blk.21.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 267: blk.21.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 268: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 269: blk.22.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.22.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 271: blk.22.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 272: blk.22.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 273: blk.22.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 274: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 275: blk.22.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 276: blk.22.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 277: blk.22.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 278: blk.22.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 279: blk.22.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 281: blk.23.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 282: blk.23.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 283: blk.23.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 284: blk.23.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 285: blk.23.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 286: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 287: blk.23.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.23.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 289: blk.23.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 290: blk.23.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 291: blk.23.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 292: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 293: blk.24.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 294: blk.24.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 295: blk.24.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 296: blk.24.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 297: blk.24.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 298: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 299: blk.24.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 300: blk.24.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 301: blk.24.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 302: blk.24.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 303: blk.24.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 304: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 305: blk.25.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 306: blk.25.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 307: blk.25.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 308: blk.25.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 309: blk.25.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 310: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 311: blk.25.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 312: blk.25.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 313: blk.25.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 314: blk.25.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 315: blk.25.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 316: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 317: blk.26.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 318: blk.26.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 319: blk.26.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 320: blk.26.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 321: blk.26.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 322: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 323: blk.26.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 324: blk.26.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 325: blk.26.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 326: blk.26.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 327: blk.26.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 328: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 329: blk.27.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 330: blk.27.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 331: blk.27.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 332: blk.27.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 333: blk.27.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 334: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 335: blk.27.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 336: blk.27.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 337: blk.27.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 338: blk.27.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 339: blk.27.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 340: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 341: blk.28.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 342: blk.28.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 343: blk.28.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 344: blk.28.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 345: blk.28.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 346: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 347: blk.28.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 348: blk.28.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 349: blk.28.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 350: blk.28.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 351: blk.28.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 352: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 353: blk.29.attn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 354: blk.29.attn_qkv.weight f16 [ 4096, 12288, 1, 1 ]
llama_model_loader: - tensor 355: blk.29.attn_qkv.bias f32 [ 12288, 1, 1, 1 ]
llama_model_loader: - tensor 356: blk.29.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 357: blk.29.attn_output.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 358: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 359: blk.29.ffn_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 360: blk.29.ffn_up.weight f16 [ 4096, 16384, 1, 1 ]
llama_model_loader: - tensor 361: blk.29.ffn_up.bias f32 [ 16384, 1, 1, 1 ]
llama_model_loader: - tensor 362: blk.29.ffn_down.weight f16 [ 16384, 4096, 1, 1 ]
llama_model_loader: - tensor 363: blk.29.ffn_down.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 364: output_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 365: output_norm.bias f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: bloom.context_length u32
llama_model_loader: - kv 3: bloom.embedding_length u32
llama_model_loader: - kv 4: bloom.feed_forward_length u32
llama_model_loader: - kv 5: bloom.block_count u32
llama_model_loader: - kv 6: bloom.attention.head_count u32
llama_model_loader: - kv 7: bloom.attention.head_count_kv u32
llama_model_loader: - kv 8: bloom.attention.layer_norm_epsilon f32
llama_model_loader: - kv 9: general.file_type u32
llama_model_loader: - kv 10: tokenizer.ggml.model str
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr
llama_model_loader: - kv 12: tokenizer.ggml.scores arr
llama_model_loader: - kv 13: tokenizer.ggml.token_type arr
llama_model_loader: - kv 14: tokenizer.ggml.merges arr
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32
llama_model_loader: - type f32: 244 tensors
llama_model_loader: - type f16: 122 tensors
llm_load_vocab: mismatch in special tokens definition ( 203/250880 vs 0/250880 ).
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = bloom
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 250880
llm_load_print_meta: n_merges = 250434
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 30
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 16384
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly F16
llm_load_print_meta: model params = 8.10 B
llm_load_print_meta: model size = 15.08 GiB (16.00 BPW)
llm_load_print_meta: general.name = Bloom
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 3 '<pad>'
llm_load_print_meta: LF token = 130 'Ä'
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA TITAN RTX) as main device
llm_load_tensors: mem required = 1960.15 MB
llm_load_tensors: offloading 30 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: VRAM used: 13486.12 MB
...GGML_ASSERT: /llama.cpp/ggml-cuda.cu:6115: false
Aborted (core dumped)
Metadata
Metadata
Assignees
Labels
No labels