server : fix endpoint checks #10135

ggerganov · 2024-11-02T09:30:58Z

I think 0d6f6a7 messed up the endpoint checks. Per the readme, the --embeddings flag should restrict to just /embeddings endpoint, while --reranking should enable the /rerank endpoint.

ggml-ci

ngxson · 2024-11-02T09:58:22Z

Hmm yeah I think I misunderstood --embedding. I thought that it means "enable embd && disable completion"

So just to confirm, the server does support having some slots running embd and some slots running completion at the same time, right?

I'm asking this because I can't find llama_set_causal_attn anywhere in the server.cpp code

ggerganov · 2024-11-02T13:24:11Z

So just to confirm, the server does support having some slots running embd and some slots running completion at the same time, right?

I'm asking this because I can't find llama_set_causal_attn anywhere in the server.cpp code

I'm not really sure what is the state of this functionality, and AFAIK most people use the "embedding + completion" mode just for testing purposes (i.e. avoid starting 2 separate instances of llama-server). Technically, for getting the embeddings from a LLaMA model for example, you don't need to call llama_set_causal_attn(false). Just llama_set_embeddings(true) which we already do. There are models like GritLM which would require correct calls to llama_set_causal_attn and this is not supported by llama-server atm.

ngxson

OK thanks for the explanation. That sounds good.

ggml-ci

server : fix endpoint checks

915e6a0

ggml-ci

github-actions bot added examples server labels Nov 2, 2024

ngxson approved these changes Nov 2, 2024

View reviewed changes

ggerganov merged commit 4595041 into master Nov 2, 2024
59 of 60 checks passed

ggerganov deleted the gg/server-fix-endpoints branch November 2, 2024 16:34

ggerganov mentioned this pull request Nov 5, 2024

Server allow /completion and /embedding #3815

Closed

4 tasks

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

server : fix endpoint checks (ggml-org#10135)

c9ed90b

ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

server : fix endpoint checks (ggml-org#10135)

6a745da

ggml-ci

cebtenzzre mentioned this pull request May 18, 2025

context : allow cache-less context for embeddings #13108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : fix endpoint checks #10135

server : fix endpoint checks #10135

Uh oh!

ggerganov commented Nov 2, 2024

Uh oh!

ngxson commented Nov 2, 2024

Uh oh!

ggerganov commented Nov 2, 2024

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

Uh oh!

server : fix endpoint checks #10135

server : fix endpoint checks #10135

Uh oh!

Conversation

ggerganov commented Nov 2, 2024

Uh oh!

ngxson commented Nov 2, 2024

Uh oh!

ggerganov commented Nov 2, 2024

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!