Experimental playground for Gemma 3 vision #12348
ngxson
started this conversation in
Show and tell
Replies: 2 comments
-
no need for build if you installed from brew btw (with ![]() |
Beta Was this translation helpful? Give feedback.
0 replies
-
Linking new vision support here for visibility (heh): https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I mirror the guide from #12344 for more visibility.
To support Gemma 3 vision model, a new binary
llama-gemma3-cli
was added to provide a playground, support chat mode and simple completion mode.Important
Please note that this is not intended to be a prod-ready product, but mostly acts as a demo. Please refer to #11292 for future plan of the vision support.
How to try this?
Step 1: Get the text model
Download it from: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913
Step 2: Get the mmproj (multi-modal projection) model
Option 1: Download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913
(You must download both the text model and the
mmproj
file)Option 2: Convert it yourself
We will need
model.gguf
generated from theconvert_hf_to_gguf.py
script above, plus vision tower saved inmmproj.gguf
Firstly, get the
mmproj.gguf
file:Step 3: Compile and run
Clone this repo and compile
llama-gemma3-cli
cd llama.cpp cmake -B build cmake --build build -j --target llama-gemma3-cli
Run it:
Example output:
Beta Was this translation helpful? Give feedback.
All reactions