Experimental playground for Gemma 3 vision #12348

ngxson · 2025-03-12T10:00:26Z

ngxson
Mar 12, 2025
Collaborator

I mirror the guide from #12344 for more visibility.

To support Gemma 3 vision model, a new binary llama-gemma3-cli was added to provide a playground, support chat mode and simple completion mode.

Important

Please note that this is not intended to be a prod-ready product, but mostly acts as a demo. Please refer to #11292 for future plan of the vision support.

How to try this?

Step 1: Get the text model

Download it from: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

Step 2: Get the mmproj (multi-modal projection) model

Option 1: Download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

(You must download both the text model and the mmproj file)

Option 2: Convert it yourself

We will need model.gguf generated from the convert_hf_to_gguf.py script above, plus vision tower saved in mmproj.gguf

Firstly, get the mmproj.gguf file:

cd gemma-3-4b-it
python ~/work/llama.cpp-gemma/examples/llava/gemma3_convert_encoder_to_gguf.py .
# output file: mmproj.gguf

Step 3: Compile and run

Clone this repo and compile llama-gemma3-cli

cd llama.cpp
cmake -B build
cmake --build build -j --target llama-gemma3-cli

Run it:

./build/bin/llama-gemma3-cli -m model.gguf --mmproj mmproj.gguf

Example output:

 Running in chat mode, available commands:
   /image <path>    load an image
   /clear           clear the chat history
   /quit or /exit   exit the program

> hi    
Hello! How's it going today? 

Is there something specific on your mind, or were you simply saying hi? 😊 

I’m here to chat, answer questions, help with creative tasks, or just listen – whatever you need!

> /image ../models/bliss.png
Encoding image ../models/bliss.png

> what is that
That's a beautiful image!

julien-c · 2025-03-12T12:23:21Z

julien-c
Mar 12, 2025

no need for build if you installed from brew btw (with brew install llama.cpp --HEAD)

0 replies

99991 · 2025-05-26T12:47:45Z

99991
May 26, 2025

Linking new vision support here for visibility (heh):

https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental playground for Gemma 3 vision #12348

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Experimental playground for Gemma 3 vision #12348

Uh oh!

ngxson Mar 12, 2025 Collaborator

How to try this?

Step 1: Get the text model

Step 2: Get the mmproj (multi-modal projection) model

Step 3: Compile and run

Replies: 2 comments

Uh oh!

julien-c Mar 12, 2025

Uh oh!

99991 May 26, 2025

ngxson
Mar 12, 2025
Collaborator

julien-c
Mar 12, 2025

99991
May 26, 2025