Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
System: OpenSuse Leap 15.4
GPU: AMD RX580 8GB
Vulkan Instance Version: 1.3.275
VkPhysicalDeviceMemoryProperties:
memoryHeaps: count = 3
memoryHeaps[0]:
size = 8321499136 (0x1f0000000) (7.75 GiB)
budget = 8310034432 (0x1ef511000) (7.74 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 4133625856 (0xf6622000) (3.85 GiB)
budget = 4124254208 (0xf5d32000) (3.84 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 256970752 (0x0f511000) (245.07 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64
GGUF model: mistral-7b-instruct-v0.2.Q6_K.gguf
First of all, thanks to Occam for new Vulkan implementation of Llama.CPP!
I have tried to run llama.cpp according instruction "without Docker":
--> ./bin/main -m "PATH_TO_mistral-7b-instruct-v0.2.Q6_K.gguf" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4
Got an error:
ggml_vulkan: Device memory allocation of size 4257734656 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/root/GPT/GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'
main: error: unable to load model
FYI:
If I reduce -ngl to 23 layers then everything worked properly, but slowly.
llm_load_tensors: CPU buffer size = 5666.09 MiB
llm_load_tensors: Vulkan0 buffer size = 3925.09 MiB
6.51 tokens per second (OpenCL version loads full model on the same GPU and shows ~12 tokens\sec)