Skip to content

Bug: llama-server crash when defragmenting (llama_kv_cache_defrag_internal) #9314

Closed
@ExtReMLapin

Description

@ExtReMLapin

What happened?

When I run the server with the following arguments : ./llama.cpp/llama-server --host 0.0.0.0 --port 55777 --model /opt/IdExtend/models/llm/c4ai-command-r-08-2024-Q5_K_M.gguf --flash-attn --cache-type-k q4_0 --cache-type-v q4_0 --defrag-thold 0.5 --ctx-size 60000 --threads-http 16 -np 2 --tensor-split 0.6958696919102823,0.30413030808971775,0.0 -ngl 99999

sent data is something like that :

{
    "prompt": <aroun 19000 tokens>,
    "temperature": 0.3,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "repeat_penalty": 1.0,
    "stream": true,
    "n_keep": 30000,
    "n_predict": 20219
}

I use it to support 2 concurrent users with a context of 30k tokens each.

And different requests I end up quickly (after less than 10 requests) having the llama-server crashing.

However I managed to get a crash dump out of it (please see full GDB dump as attached file)

The crashdump itself is 1gb, I can try to find a place to upload it if you need it, with the llama-server build I have.

0  0x00007fa921c419fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007fa921bed476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007fa921bd37f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3  0x00007fa921c34676 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#4  0x00007fa921c4bcfc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#5  0x00007fa921c4c7cc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x00007fa921c4d8b9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#7  0x00007fa921c50453 in free () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#8  0x000056277e643dab in ggml_hash_set_free (hash_set=<optimized out>) at ggml/src/ggml.c:17794
No locals.
#9  0x000056277e652d6d in ggml_gallocr_reserve_n (galloc=0x56278492a730, graph=graph@entry=0x562783b0dd58, node_buffer_ids=0x5627846edce0, leaf_buffer_ids=0x562784795cf0) at ggml/src/ggml-alloc.c:677
        min_hash_size = 28600
        __func__ = "ggml_gallocr_reserve_n"
#10 0x000056277e658b44 in ggml_backend_sched_alloc_splits (sched=<optimized out>) at ggml/src/ggml-backend.c:1752
        backend_ids_changed = <optimized out>
        backend_ids_changed = <optimized out>
        __func__ = "ggml_backend_sched_alloc_splits"
        i = <optimized out>
        i = <optimized out>
#11 ggml_backend_sched_alloc_graph (sched=0x562783b0dc00, graph=<optimized out>) at ggml/src/ggml-backend.c:1968
No locals.
#12 0x000056277e65911a in ggml_backend_sched_graph_compute_async (sched=0x562783b0dc00, graph=0x5627893f04e0) at ggml/src/ggml-backend.c:1989
No locals.
#13 0x000056277e6a5079 in llama_graph_compute (threadpool=0x0, n_threads=<optimized out>, gf=0x5627893f04e0, lctx=...) at src/llama.cpp:16023
No locals.
#14 llama_kv_cache_defrag_internal (lctx=...) at src/llama.cpp:16691
        n_kv = 40340
        n_used = 12240
        n_moves = <optimized out>
        max_moves = <optimized out>
        kv_self = <optimized out>
        hparams = <optimized out>
        n_layer = <optimized out>
        ids = std::vector of length 40340, capacity 40340 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
          38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 
          87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 
          128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 
          167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199...}
        gf = 0x5627893f04e0
        kv_self = <optimized out>
        hparams = <optimized out>
        n_layer = <optimized out>
        n_kv = <optimized out>
        n_used = <optimized out>
        n_moves = <optimized out>
        max_moves = <optimized out>
        ids = <optimized out>
        gf = <optimized out>
        i0 = <optimized out>
        cell0 = <optimized out>
        nh = <optimized out>
        nf = <optimized out>
        is = <optimized out>
        i1 = <optimized out>
        cont = <optimized out>
        stop = <optimized out>
        cell1 = <optimized out>
        cell1 = <optimized out>
#15 llama_kv_cache_update_internal (lctx=...) at src/llama.cpp:16735
        need_reserve = <optimized out>
        need_reserve = <optimized out>
        __func__ = <optimized out>
        gf = <optimized out>
        kv_self = <optimized out>
        i = <optimized out>
        n_seqs = <optimized out>
        n_tokens = <optimized out>
        token = <optimized out>
        ubatch = <optimized out>
        gf = <optimized out>
#16 llama_kv_cache_update (ctx=0x562783b291f0) at src/llama.cpp:18925
No locals.
#17 0x000056277e6f94ab in llama_decode_internal (batch_all=..., lctx=...) at src/llama.cpp:16141

bt.txt

Name and Version

version 1884 (c28f4be)

What operating system are you seeing the problem on?

No response

Relevant log output

(please see gdb logs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions