Skip to content

server : fix endpoint checks #10135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2024
Merged

server : fix endpoint checks #10135

merged 1 commit into from
Nov 2, 2024

Conversation

ggerganov
Copy link
Member

ref #3815 (comment)

I think 0d6f6a7 messed up the endpoint checks. Per the readme, the --embeddings flag should restrict to just /embeddings endpoint, while --reranking should enable the /rerank endpoint.

@ngxson
Copy link
Collaborator

ngxson commented Nov 2, 2024

Hmm yeah I think I misunderstood --embedding. I thought that it means "enable embd && disable completion"

So just to confirm, the server does support having some slots running embd and some slots running completion at the same time, right?

I'm asking this because I can't find llama_set_causal_attn anywhere in the server.cpp code

@ggerganov
Copy link
Member Author

So just to confirm, the server does support having some slots running embd and some slots running completion at the same time, right?

I'm asking this because I can't find llama_set_causal_attn anywhere in the server.cpp code

I'm not really sure what is the state of this functionality, and AFAIK most people use the "embedding + completion" mode just for testing purposes (i.e. avoid starting 2 separate instances of llama-server). Technically, for getting the embeddings from a LLaMA model for example, you don't need to call llama_set_causal_attn(false). Just llama_set_embeddings(true) which we already do. There are models like GritLM which would require correct calls to llama_set_causal_attn and this is not supported by llama-server atm.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks for the explanation. That sounds good.

@ggerganov ggerganov merged commit 4595041 into master Nov 2, 2024
59 of 60 checks passed
@ggerganov ggerganov deleted the gg/server-fix-endpoints branch November 2, 2024 16:34
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants