[User] Segfault when saving session cache since ecb217d

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

No segfault when saving session cache since ecb217d

# Current Behavior

Segfault when saving session cache since ecb217d

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

```
$ system_profiler SPHardwareDataType
Hardware:

    Hardware Overview:

      Model Name: Mac Studio
      Model Identifier: Mac13,1
      Model Number: Z14J000LLX/A
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 64 GB
      System Firmware Version: 8422.100.650
      OS Loader Version: 8422.100.650
```

* Operating System, e.g. for Linux:

```
$ sw_vers
ProductName:		macOS
ProductVersion:		13.3.1
ProductVersionExtra:	(a)
BuildVersion:		22E772610a

$ uname -a
Darwin workstation.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 arm64
```

* SDK version, e.g. for Linux:

```
$ python3 --version
Python 3.11.3

$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

$ g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```

# Failure Information (for bugs)

Segfault when saving session cache since ecb217d

# Steps to Reproduce

1. `$ git checkout ecb217d`
2. `$ make clean; make`
3. `$ rm -f /tmp/prompt.cache; ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"`

# Failure Logs

```
$ rm -f /tmp/prompt.cache; ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
main: build = 612 (ecb217d)
main: seed  = 0
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 1932.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/tmp/prompt.cache'
main: session file does not exist, will create
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 50, n_keep = 0


 Top 10 cat memes:fish: Job 1, './main -m ./models/7B/ggml-mode…' terminated by signal SIGSEGV (Address boundary error)
```

Backtrace after building with LLAMA_DEBUG=1:
```
$ rm -f /tmp/prompt.cache; lldb -b -o 'run' -k 'bt' -- ./main -m ./models/7B/ggml-model-q4_0.bin -n 50 -s 0 -p "Top 10 cat memes:" --prompt-cache "/tmp/prompt.cache"
(lldb) target create "./main"
Current executable set to '/tmp/llama.cpp/main' (arm64).
(lldb) settings set -- target.run-args  "-m" "./models/7B/ggml-model-q4_0.bin" "-n" "50" "-s" "0" "-p" "Top 10 cat memes:" "--prompt-cache" "/tmp/prompt.cache"
(lldb) run
main: build = 612 (ecb217d)
main: seed  = 0
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 1932.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/tmp/prompt.cache'
main: session file does not exist, will create
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 50, n_keep = 0


 Top 10 cat memes:GGML_ASSERT: ggml.c:3986: ((uintptr_t) (ctx->mem_buffer))%GGML_MEM_ALIGN == 0
Process 22227 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x000000018d520724 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
->  0x18d520724 <+8>:  b.lo   0x18d520744               ; <+40>
    0x18d520728 <+12>: pacibsp
    0x18d52072c <+16>: stp    x29, x30, [sp, #-0x10]!
    0x18d520730 <+20>: mov    x29, sp
Target 0: (main) stopped.
Process 22227 launched: '/tmp/llama.cpp/main' (arm64)
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x000000018d520724 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018d557c28 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018d465ae8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000100010f34 main`ggml_init(params=(mem_size = 4096, mem_buffer = 0x000000016fdebd98, no_alloc = true)) at ggml.c:3986:5
    frame #4: 0x000000010004d174 main`::llama_copy_state_data(ctx=0x0000000101009c00, dst=" \U0000001a") at llama.cpp:2739:38
    frame #5: 0x000000010004e8d4 main`::llama_save_session_file(ctx=0x0000000101009c00, path_session="/tmp/prompt.cache", tokens=0x0000600000d7f660, n_token_count=9) at llama.cpp:2956:41
    frame #6: 0x0000000100003aac main`main(argc=11, argv=0x000000016fdfe6c0) at main.cpp:422:17
    frame #7: 0x000000018d1fff28 dyld`start + 2236
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[User] Segfault when saving session cache since ecb217d #1699

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[User] Segfault when saving session cache since ecb217d #1699

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions