-
Notifications
You must be signed in to change notification settings - Fork 169
Replace cortex.llamacpp
with minimalist fork of llama.cpp
#1728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree that we should align with the llama.cpp upstream, but I have several concerns:
|
Tasklist:
Related tickets need to be tested and verified:
Approach 1:
Approach 2: Build llama.cpp server as a library and load it into
|
Specs changes:
Diagram: Task list:
Engine variants:
In case of macOS, we'd like to support macos-12, then we implemented a filter for llama.cpp server in cortex |
cortex.llamacpp
with minimalist fork of llama.cpp
cortex.llamacpp
with minimalist fork of llama.cpp
cortex.llamacpp
with minimalist fork of llama.cpp
|
QA-checklistOS
Engine variant:
Scope:CLIInstallation
Engine management:
Model Running
APIEngine management
Running Models
Additional requirements
|
testing completed, should be closed |
Uh oh!
There was an error while loading. Please reload this page.
Goal
llamacpp-engine
Can we consider refactoring llamacpp-engine to use the server implementation, and maintain a fork with our improvements to speech, vision etc? This is especially if we do a C++ implementation of whisperVQ in the future.
Potential issues
cortex engines llama.cpp update
-> updates llama.cppavx-512
variants forjanhq/llama.cpp
(i.e. build scripts)janhq/llama.cpp
release names withggml-org/llama.cpp
logit_bias
,n
etc by either upstreaming or in Cortex ServerKey Changes
llama-server
instead of Drogon that we use incortex.llamacpp
llama.cpp
process instead ofdylib
(better stablity, parallelism)The text was updated successfully, but these errors were encountered: