Open
Description
As today ggml force aborts the process whenever there is a cuda malloc failure: eg:
#2 0x00007f99c75ca66e in ggml_abort.cold ()
#3 0x00007f99c7b57882 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#4 0x00007f99c7b5ae80 in ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) ()
#5 0x00007f99c7b648a6 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) ()
#6 0x00007f99c7b6a2e8 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*)
#7 0x00007f99c7b21715 in ggml_backend_sched_graph_compute_async () from /home/nbuild/pub/xmt/latest/lib/libsdl-xnn-ggml.so
This is not ideal for some production context in which we need to have a controlled way to return an OOM error and exit/reload/resume/skip gracefully.
Would you mind if I:
- add an option (eg GGML_NO_ABORT_ON_OOM) to skip abort if malloc failures
- return a GGML_STATUS_ALLOC_FAILED to upper calls in the stack (ggml_cuda_mul_mat, ...) if cuda_malloc failed
?
Note:
- ggml would still have same behavior as today: abort in all cases
- this would be just for malloc failures: would still abort in all other cases.
Best
W.
Metadata
Metadata
Assignees
Labels
No labels