Add option to build ggml-cuda as JIT using nvrtc

Add option to build ggml-cuda as JIT using nvrtc.

As today ggml-cuda is AOT (Ahead Of Time): 
all cuda kernels are compiled ahead of time using the local nvcc for a limited range of nvidia archs. This makes these embedded kernels only runnable on these archs. This also potentially makes ggml-cuda.so huge.

Another way to run device/npu/gpu kernels is to do JIT (Just in Time): 
ggml-cuda to embed the source code of (some) kernels in the lib, link with nvrtc to compile the needed kernels/PTX, and only kernles which are needed, at runtime, just before the very first kernel execution. 
 Advantages: speeds up building ggml-cuda, limits the size of the lib, targets more archs dynamically, allows to parameter the kernels for the local hardware, and get better perf. 
 Drawbacks:  the very first kernel exec is slower because needs a runtime compilation.
Refs:
https://docs.nvidia.com/cuda/nvrtc/
https://github.com/pytorch/pytorch/blob/b0a5d55c584792a504ec18600180e3d1200dfea6/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L1262
https://github.com/arrayfire/arrayfire/blob/360fefb3551a7c9f91250b0ec894aad76ec6a022/src/backend/cuda/compile_module.cpp#L153

@ggerganov 
would you consider some PRs to add that option (no change of default behavior, still AOT/nvcc)?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to build ggml-cuda as JIT using nvrtc #1154

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add option to build ggml-cuda as JIT using nvrtc #1154

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions