Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728

dan-menlo · 2024-11-26T07:34:57Z

Goal

Goal: Can we have a minimalist fork of llama.cpp as llamacpp-engine
- cortex.cpp's desktop focus means Drogon's features are unused
- We should contribute our vision and multimodal work upstream as a form of llama.cpp server
- Very clear Engines abstraction (i.e. support OpenVino etc in the future)
Goal: Contribute upwards to llama.cpp
- Vision, multimodal
- May not be possible if the vision, audio encoders are Python-runtime based

Can we consider refactoring llamacpp-engine to use the server implementation, and maintain a fork with our improvements to speech, vision etc? This is especially if we do a C++ implementation of whisperVQ in the future.

Potential issues

cortex engines llama.cpp update -> updates llama.cpp
- We still need to build avx-512 variants for janhq/llama.cpp (i.e. build scripts)
- We should align the janhq/llama.cpp release names with ggml-org/llama.cpp
- Trigger automatic CI/CD to build
- We can also ask GG if we can donate compute towards builds
Deprecating llava support
Handling existing API endpoints for logit_bias, n etc by either upstreaming or in Cortex Server
Update Documentation
DevRel @ramonpzg
- Cortex builds on llamacpp-server (and we will contribute in the future)
- Why do we need to build so many different types of llama.cpp (AVX512, AVX2)
- GG -> can we contribute Menlo Cloud to llama.cpp project (built up Intel CPUs)

Key Changes

Use llama-server instead of Drogon that we use in cortex.llamacpp
Use a spawned llama.cpp process instead of dylib (better stablity, parallelism)
- However, we will effectively need to build a process manager

The text was updated successfully, but these errors were encountered:

vansangpfiev · 2024-11-26T08:24:12Z

I agree that we should align with the llama.cpp upstream, but I have several concerns:

Drogon is part of cortex.cpp, we have already removed it from llama-cpp engine. If we remove Drogon from cortex.cpp, we need to find a replacement, which will be costly.
Repository Structure: Forking the server implementation will necessitate changes to our repository structure, since we currently use llama.cpp as a submodule.
Our current version differs significantly from the upstream version, which will require considerable time for refactoring.

vansangpfiev · 2024-12-23T02:21:58Z

TC117 · 2025-01-16T02:29:52Z

Not show variant on Window like Linux

Working on Linux

Cant load model with cuda variant

PS C:\WINDOWS\system32> cortex-nightly.exe run tinyllama
Starting server ...
Set log level to INFO
Host: 127.0.0.1 Port: 39281
Server started
API Documentation available at: http://127.0.0.1:39281
Model failed to start: Failed to load model
Error: Failed to start model
PS C:\WINDOWS\system32>

vansangpfiev · 2025-03-11T02:06:57Z

dan-menlo · 2025-03-21T05:02:37Z

Can ship early next week

vansangpfiev · 2025-03-25T02:03:51Z

QA-checklist

OS

Windows 11
Ubuntu 24, 22
Mac Silicon OS 14/15
Mac Intel

Engine variant:

Scope:

CLI

Installation

it should install with local installer (default; no internet required during installation, all dependencies bundled)
it should install with network installer
it should install 2 binaries (cortex and cortex-server) [mac: binaries in /usr/local/bin]
it should install with correct folder permissions
it should install with folders: /engines /logs (no /models folder until model pull)
It should install with Docker image https://cortex.so/docs/installation/docker/

Engine management:

Model Running

cortex run <cortexso model> - if no local models detected, shows pull model menu
cortex run - if local model detected, runs the local model
cortex run - if multiple local models detected, shows list of local models (from multiple model sources eg cortexso, HF authors) for users to select (via regex search)
cortex run <invalid model id> should return gracefully Model not found!
run should autostart server
cortex run <model> starts interactive chat (by default)
cortex run <model> -d runs in detached mode
cortex models start <model>
terminate StdIn or exit() should exit interactive chat

API

Engine management

List engines: GET /v1/engines
Get engine: GET /v1/engines/{name}
Install engine: POST /v1/engines/install/{name}
Get default engine variant/version: GET v1/engines/{name}/default
Set default engine variant/version: POST v1/engines/{name}/default
Load engine: POST v1/engines/{name}/load
Unload engine: DELETE v1/engines/{name}/load
Update engine: POST v1/engines/{name}/update
uninstall engine: DELETE /v1/engines/install/{name}

Running Models

List models: GET v1/models
Start model: POST /v1/models/start
Stop model: POST /v1/models/stop
Get model: GET /v1/models/{id}
Delete model: DELETE /v1/models/{id}
Update model: PATCH /v1/models/{model} updates model.yaml params

Additional requirements

Cortex spawns new child process when start a model
Cortex terminates child process when stop a model
Cortex terminates all child processes when it stops

david-menloai · 2025-04-02T14:31:48Z

testing completed, should be closed

dan-menlo added the type: epic A major feature or initiative label Nov 26, 2024

dan-menlo assigned vansangpfiev Nov 26, 2024

dan-menlo added this to Menlo Nov 26, 2024

github-project-automation bot moved this to Investigating in Menlo Nov 26, 2024

gabrielle-ong added this to the v1.0.5 milestone Nov 28, 2024

gabrielle-ong mentioned this issue Nov 28, 2024

Sprint 26 Planning #1735

Closed

gabrielle-ong removed this from the v1.0.5 milestone Nov 28, 2024

dan-menlo closed this as completed Dec 15, 2024

github-project-automation bot moved this from Investigating to QA in Menlo Dec 15, 2024

dan-menlo reopened this Dec 15, 2024

github-project-automation bot moved this from QA to In Progress in Menlo Dec 15, 2024

dan-menlo changed the title ~~epic: llamacpp-engine to align with llama.cpp upstream~~ roadmap: llamacpp-engine to align with llama.cpp upstream Dec 15, 2024

vansangpfiev moved this from In Progress to Eng Review in Menlo Jan 2, 2025

vansangpfiev mentioned this issue Jan 2, 2025

feat: use llama cpp server menloresearch/cortex.llamacpp#350

Closed

dan-menlo assigned TC117 Jan 7, 2025

dan-menlo moved this from Eng Review to QA in Menlo Jan 10, 2025

TC117 added this to the v1.0.9 milestone Jan 13, 2025

TC117 mentioned this issue Feb 4, 2025

roadmap: Engines are dylibs and are self-contained #1732

Closed

3 tasks

TC117 moved this from QA to Completed in Menlo Feb 6, 2025

vansangpfiev unassigned TC117 Mar 11, 2025

vansangpfiev modified the milestones: v1.0.9, v1.0.12 Mar 11, 2025

vansangpfiev moved this from Completed to In Progress in Menlo Mar 11, 2025

ramonpzg added this to Jan Mar 13, 2025

dan-menlo changed the title ~~roadmap: llamacpp-engine to align with llama.cpp upstream~~ epic: Replace cortex.llamacpp with minimalist fork of llama.cpp Mar 13, 2025

vansangpfiev moved this to In Progress in Jan Mar 17, 2025

dan-menlo assigned ramonpzg Mar 18, 2025

ramonpzg changed the title ~~epic: Replace cortex.llamacpp with minimalist fork of llama.cpp~~ Replace cortex.llamacpp with minimalist fork of llama.cpp Mar 18, 2025

ramonpzg removed this from Menlo Mar 18, 2025

ramonpzg modified the milestones: v1.0.12, Caffeinated Sloth Mar 18, 2025

github-project-automation bot added this to Menlo Mar 18, 2025

github-project-automation bot moved this to Investigating in Menlo Mar 18, 2025

ramonpzg added epic and removed type: epic A major feature or initiative labels Mar 18, 2025

vansangpfiev mentioned this issue Mar 19, 2025

feat: use llama.cpp server #2128

Merged

3 tasks

dan-menlo unassigned ramonpzg Mar 21, 2025

vansangpfiev moved this from Investigating to In Progress in Menlo Mar 24, 2025

vansangpfiev moved this from In Progress to Eng Review in Menlo Mar 25, 2025

vansangpfiev moved this from Eng Review to QA in Menlo Mar 25, 2025

david-menloai self-assigned this Mar 25, 2025

david-menloai closed this as completed Apr 2, 2025

github-project-automation bot moved this from In Progress to Done in Jan Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728

Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728

dan-menlo commented Nov 26, 2024 •

edited

Loading

vansangpfiev commented Nov 26, 2024

Uh oh!

vansangpfiev commented Dec 23, 2024 •

edited

Loading

Uh oh!

TC117 commented Jan 16, 2025

Uh oh!

vansangpfiev commented Mar 11, 2025 •

edited

Loading

Uh oh!

dan-menlo commented Mar 21, 2025

Uh oh!

vansangpfiev commented Mar 25, 2025 •

edited by david-menloai

Loading

Uh oh!

david-menloai commented Apr 2, 2025

Uh oh!

Replace cortex.llamacpp with minimalist fork of llama.cpp #1728

Replace cortex.llamacpp with minimalist fork of llama.cpp #1728

Comments

dan-menlo commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Potential issues

Key Changes

vansangpfiev commented Nov 26, 2024

Uh oh!

vansangpfiev commented Dec 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TC117 commented Jan 16, 2025

Uh oh!

vansangpfiev commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dan-menlo commented Mar 21, 2025

Uh oh!

vansangpfiev commented Mar 25, 2025 • edited by david-menloai Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QA-checklist

Scope:

CLI

API

Additional requirements

Uh oh!

david-menloai commented Apr 2, 2025

Uh oh!

Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728

Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728

dan-menlo commented Nov 26, 2024 •

edited

Loading

vansangpfiev commented Dec 23, 2024 •

edited

Loading

vansangpfiev commented Mar 11, 2025 •

edited

Loading

vansangpfiev commented Mar 25, 2025 •

edited by david-menloai

Loading