You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about embeddings.
I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model Mistral-Nemo-Instruct-2407-Q6_K.gguf
All works fine with '/completion' requests.
But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me.
For example using the following curl command to compute a embedding:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a question about embeddings.
I am running the latest llama.cpp in a Kubernetes cluster with CUBA support. I am using the model
Mistral-Nemo-Instruct-2407-Q6_K.gguf
All works fine with '/completion' requests.
But if I want t use the the llama.cpp web server to compute embeddings the result is confusing me.
For example using the following curl command to compute a embedding:
$ curl https://llama.cpp.foo.com/embedding -H "Content-Type: application/json" -d '{"input":["Paris"]}'
The result is very fast (below 1sec), but the server returns a json object with 5120 Floats!? And I guess this is wrong?
The result looks like this one:
[{"index":0,"embedding":[[3.4135327339172363,-1.7873748540878296,....,0.41544172167778015]]}]
Can someone explain me what I am doing wrong here?
Thanks for any help
===
Ralph
Beta Was this translation helpful? Give feedback.
All reactions