Why is my embedding vector result so large? #13968

rsoika · 2025-06-02T11:57:31Z

rsoika
Jun 2, 2025

Hi,

I have a question about embeddings.
I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model Mistral-Nemo-Instruct-2407-Q6_K.gguf
All works fine with '/completion' requests.

But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me.
For example using the following curl command to compute a embedding:

$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'

The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?

The result looks like this one:

[{"index":0,"embedding":[[3.4135327339172363,-1.7873748540878296,....,0.41544172167778015]]}]

Can someone explain me what I am doing wrong here?

Thanks for any help

===
Ralph

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is my embedding vector result so large? #13968

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why is my embedding vector result so large? #13968

Uh oh!

rsoika Jun 2, 2025

Replies: 0 comments

rsoika
Jun 2, 2025