Closed
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA vGPU-32GB, compute capability 8.9, VMM: yes
version: 4954 (3cd3a39)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
4080S 32G
Models
No response
Problem description & steps to reproduce
If you use CUDA to compile llama.cpp, name will encounter a crash when using llama-llava-clip-quantize-cli to quantize the vision part of the clip. After checking, the error area is found in the figure below.
This is most likely an error caused by the inability to access memory in the GPU backend. It needs to be compiled into a CPU backend version before it can be executed. Have you encountered this problem?
./build/bin/llama-llava-clip-quantize-cli ~/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf ~/autodl-tmp/llava-v1.5-7b/mmproj-model-Q4_0.gguf 2
First Bad Commit
No response
Relevant log output
(llamacpp) root@autodl-container-1a0b499d52-72782394:~/llama.cpp# ./build/bin/llama-llava-clip-quantize-cli ~/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf ~/autodl-tmp/llava-v1.5-7b/mmproj-model-Q4_0.gguf 2
clip_init: model name: BGE-VL-large
clip_init: description: image encoder for LLaVA
clip_init: GGUF version: 3
clip_init: alignment: 32
clip_init: n_tensors: 377
clip_init: n_kv: 19
clip_init: ftype: f16
clip_init: loaded meta data with 19 key-value pairs and 377 tensors from /root/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf
clip_init: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_init: - kv 0: general.architecture str = clip
clip_init: - kv 1: clip.has_text_encoder bool = false
clip_init: - kv 2: clip.has_vision_encoder bool = true
clip_init: - kv 3: clip.has_llava_projector bool = true
clip_init: - kv 4: general.file_type u32 = 1
clip_init: - kv 5: general.name str = BGE-VL-large
clip_init: - kv 6: general.description str = image encoder for LLaVA
clip_init: - kv 7: clip.projector_type str = mlp
clip_init: - kv 8: clip.vision.image_size u32 = 224
clip_init: - kv 9: clip.vision.patch_size u32 = 14
clip_init: - kv 10: clip.vision.embedding_length u32 = 1024
clip_init: - kv 11: clip.vision.feed_forward_length u32 = 4096
clip_init: - kv 12: clip.vision.projection_dim u32 = 768
clip_init: - kv 13: clip.vision.attention.head_count u32 = 16
clip_init: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010
clip_init: - kv 15: clip.vision.block_count u32 = 23
clip_init: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
clip_init: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
clip_init: - kv 18: clip.use_gelu bool = false
clip_init: - type f32: 235 tensors
clip_init: - type f16: 142 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA vGPU-32GB, compute capability 8.9, VMM: yes
clip_ctx: CLIP using CUDA0 backend
key clip.use_silu not found in file
clip_init: text_encoder: 0
clip_init: vision_encoder: 1
clip_init: llava_projector: 1
clip_init: minicpmv_projector: 0
clip_init: minicpmv_version: 2
clip_init: glm_projector: 0
clip_init: model size: 594.86 MB
clip_init: metadata size: 0.13 MB
clip_init: params backend buffer size = 594.86 MB (377 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.feature_layer not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
clip_init: vision model hparams
image_size 224
patch_size 14
v_hidden_size 1024
v_n_intermediate 4096
v_projection_dim 768
v_n_head 16
v_n_layer 23
v_eps 0.000010
v_image_mean 0.481455 0.457828 0.408211
v_image_std 0.268630 0.261303 0.275777
v_image_grid_pinpoints:
v_vision_feature_layer:
v_mm_patch_merge_type: flat
clip_init: CUDA0 compute buffer size = 9.63 MiB
clip_init: CPU compute buffer size = 1.58 MiB
Segmentation fault (core dumped)