examples : allow extracting embeddings from decoder contexts #13797

ggerganov · 2025-05-26T09:22:47Z

fix #13795
cont #13108
ref #13108 (comment)

Sometimes it might be necessary to extract embeddings from a decoder-only model. To support this, update the examples to call llama_decode() instead of llama_encode() when computing embeddings. The call will internally fallback to llama_encode() if the model is encoder-only.

ggml-ci

aviallon · 2025-05-26T09:40:53Z

@ggerganov does this mean that models that benefited from cache-less embeddings will still benefit from the feature introduced in #13108

ggerganov · 2025-05-26T09:44:02Z

@aviallon Yes, this change should not modify the behavior for encoder-only (a.k.a. cache-less) models like BERT.

aviallon · 2025-05-26T09:53:45Z

@ggerganov @ngxson you just earned a new sponsor.

examples : allow extracting embeddings from decoder contexts

4d8fd73

ggml-ci

ggerganov requested a review from ngxson as a code owner May 26, 2025 09:22

ggerganov mentioned this pull request May 26, 2025

context : allow cache-less context for embeddings #13108

Merged

github-actions bot added examples server labels May 26, 2025

ggerganov mentioned this pull request May 26, 2025

Eval bug: Output NAN when use Qwen3 embedding models with FP16 #13795

Closed

ngxson approved these changes May 26, 2025

View reviewed changes

ggerganov merged commit 79c137f into master May 26, 2025
50 of 53 checks passed

ggerganov deleted the gg/embedding-fix-non-embd-use branch May 26, 2025 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples : allow extracting embeddings from decoder contexts #13797

examples : allow extracting embeddings from decoder contexts #13797

Uh oh!

ggerganov commented May 26, 2025

Uh oh!

aviallon commented May 26, 2025

Uh oh!

ggerganov commented May 26, 2025 •

edited

Loading

Uh oh!

aviallon commented May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

examples : allow extracting embeddings from decoder contexts #13797

examples : allow extracting embeddings from decoder contexts #13797

Uh oh!

Conversation

ggerganov commented May 26, 2025

Uh oh!

aviallon commented May 26, 2025

Uh oh!

ggerganov commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aviallon commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented May 26, 2025 •

edited

Loading

aviallon commented May 26, 2025 •

edited

Loading