Cortex.cpp: model.yaml Format #1123

dan-menlo · 2024-09-06T02:39:48Z

dan-menlo
Sep 6, 2024
Maintainer

Goal

A portable, universal model configuration file standard

GGUF

TBA

TensorRT-LLM

TBA

ONNX

TBA

Key Epics

Post from @louis-jan in Sept 2024: epic: Implement new Model Folder and model.yaml #1154 (comment)

freelerobot
Sep 6, 2024
Maintainer

Questions @nguyenhoangthuan99 @sangjanai @namchuai

How do we determine default settings when importing a new model?
We dont specify all model parameters, which leads to an issue at the UI level. How to address?
What is the model is, and how we do ensure consistency?

See related issues above.

0 replies

nguyenhoangthuan99 · 2024-09-09T03:12:18Z

nguyenhoangthuan99
Sep 9, 2024
Maintainer

Here is the default setting for a gguf-model:

name: tinyllama
model: tinyllama
version: 1
stop:
  - </s>
top_p: 0.95
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
stream: true
ngl: 33
ctx_len: 4096
engine: cortex.llamacpp
prompt_template: |-
  <|system|>
  {system_message}<|user|>
  {prompt}<|assistant|>
files:
  - /home/thuan/cortex/models/tinyllama/model.gguf
id: tinyllama
created: 1722350524607
object: model
owned_by: ''

We decide to make the params default as llama.cpp server did:

top_p: 0.95
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0

other information will be infer from the gguf metadata.

To update all params we need to update cortex.llamacpp repo to support. Just need to look at the llamacpp engine.
For now only model from gguf has the metadata, model from other source we have to handle all parameter but it still the same with corresponding gguf model arch. For example, llama3-gguf will have the same params as the llama3:tensorrt-llm

0 replies

freelerobot · 2024-09-10T04:52:52Z

freelerobot
Sep 10, 2024
Maintainer

@nguyenhoangthuan99 Can we tame this complexity by defining the following:

What's the fallback behavior for applying settings?

use gguf metadata to populate initial model.yaml
missing parameters come from default_model.yaml
later, if user wants to change a default param for a model, where do they do it?

Lets roughly group the parameters in our minds (But keep the yaml flat)
Is this the correct grouping?

Model load parameters, e.g. file location, id
Model run parameters, e.g. prompt_template, frequency_penalty
Engine parameters, i.e. param changes that will require a restart

Can we move all decorators out of the metadata.yaml? cc @louis-jan
Suggestions:

a metadata.yaml/json in the base model repo, main grain
contains: license, tags, author, description,
Related discussion: epic: llama.cpp params are settable via API call or model.yaml #1151

Scoping discussion to just GGUF, what is the default yaml for:

name: Llama 3.1
model: llama3.1
version: 1
stop:
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
top_p: 0.9
temperature: 0.6
frequency_penalty: 0
presence_penalty: 0
max_tokens: 8192
stream: true
ngl: 33
ctx_len: 8192
engine: cortex.llamacpp
prompt_template: |+
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>

  {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

files:
  - /Users/nicolezhu/cortex/models/llama3.1/model.gguf
id: llama3.1
created: 1722850990703
object: model
owned_by: ''

i.e. object, owned_by etc seems redundant

Finally, when we/ci adds new models, what does the minimal yaml look like for new models?

2 replies

nguyenhoangthuan99 Sep 10, 2024
Maintainer

What's the fallback behavior for applying settings?

use gguf metadata to populate initial model.yaml

missing parameters come from default_model.yaml

later, if user wants to change a default param for a model, where do they do it?

I think I didn't explain clearly, we don't have a default_model.yaml file. In the model.yaml everything is parsed from gguf file except inference parameters:

top_p: 0.9
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0

I get this default params from llama.cpp's server and added to model.yaml during parsing gguf file. I'll add more option (min_p, 'top_k, ... ) to this file -> need to update ours cortex.llamacpp also.
So user don't need to update default params.

Lets roughly group the parameters in our minds (But keep the yaml flat)
Is this the correct grouping?

Model load parameters, e.g. file location, id

Model run parameters, e.g. prompt_template, frequency_penalty

Engine parameters, i.e. param changes that will require a restart

I think this is correct

i.e. object, owned_by etc seems redundant

I think it's default openai api model response. https://platform.openai.com/docs/api-reference/models/list

5. Finally, when we/ci adds new models, what does the minimal yaml look like for new models?

It will look like this

name: Llama 3.1
model: llama3.1
version: 1
stop:
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
top_p: 0.9
temperature: 0.6
frequency_penalty: 0
presence_penalty: 0
max_tokens: 8192
stream: true
ngl: 33
ctx_len: 8192
engine: cortex.llamacpp
prompt_template: |+
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>

  {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

id: llama3.1
created: 1722850990703
object: model
owned_by: ''

It doesn't include the files param, files will be added when user download model

freelerobot Sep 10, 2024
Maintainer

Isn't "default model yaml" and "default parameters" two sides of the same coin? If you are inheriting llama.cpp defaults, and then using those params to populate the model.yaml, it's basically like having a default_model.yaml with the default parameters you are inheriting.

Then the question is:
1.1 Are we hardcoding the default parameters in the code, and adding them on the fly?
1.2 Or do we enable a default_model.yaml, so that users themselves can change global defaults? It becomes a feature, rather than something we are hardcoding in the dark for users.

In the past, we have encountered some GGUF files without the proper metadata. Just be aware.
Why are these params at the bottom of the file? (I know order doesn't matter, but visually why)
Is this because they are only added after the user downloads the model?

files:
  - /Users/nicolezhu/cortex/models/llama3.1/model.gguf
id: llama3.1
created: 1722850990703
object: model
owned_by: ''

What's the field model: llama3.1 for
What is "created". Time the file was downloaded by user, time the original model.yaml was created in cortexso hub?
object Are we sure this needs to be in the model yaml? Can being compatible with OpenAI response objects be something we take care of a different level.
Are we tracking which source url the model came from?
Can we fix the visual order of these params and add comments? i.e.

# Model.yaml Version
version: 1

# Cortex-specific params
id: ...

# Cortex.llamacpp params
engine: cortex.llamacpp
ngl: ...
...

i.e. version should be at the top bc it refers to the model.yaml schema version, not the model version, similar to docker.

Cortex.cpp: model.yaml Format #1123

Uh oh!

Uh oh!

dan-menlo Sep 6, 2024 Maintainer

Goal

GGUF

TensorRT-LLM

ONNX

Key Epics

Related

Replies: 3 comments · 2 replies

Uh oh!

freelerobot Sep 6, 2024 Maintainer

Uh oh!

nguyenhoangthuan99 Sep 9, 2024 Maintainer

Uh oh!

Uh oh!

freelerobot Sep 10, 2024 Maintainer

Uh oh!

Uh oh!

nguyenhoangthuan99 Sep 10, 2024 Maintainer

Uh oh!

Uh oh!

freelerobot Sep 10, 2024 Maintainer

dan-menlo
Sep 6, 2024
Maintainer

Replies: 3 comments 2 replies

freelerobot
Sep 6, 2024
Maintainer

nguyenhoangthuan99
Sep 9, 2024
Maintainer

freelerobot
Sep 10, 2024
Maintainer

nguyenhoangthuan99 Sep 10, 2024
Maintainer

freelerobot Sep 10, 2024
Maintainer