Skip to content

server : vision support via libmtmd #12898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
May 9, 2025
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
466c6cd
server : (experimental) vision support via libmtmd
ngxson Apr 11, 2025
2317e61
mtmd : add more api around mtmd_image_tokens
ngxson Apr 11, 2025
a46b6db
mtmd : add more api around mtmd_image_tokens
ngxson Apr 11, 2025
7ac0b7b
mtmd : ability to calc image hash
ngxson Apr 11, 2025
58c4767
shared_ptr for mtmd_image_tokens
ngxson Apr 12, 2025
d3c3e20
move hash to user-define ID (fixed)
ngxson Apr 12, 2025
a44029a
Merge branch 'xsn/mtmd_image_api' into xsn/server_mtmd
ngxson Apr 13, 2025
5e6c7ba
abstract out the batch management
ngxson Apr 13, 2025
78a76de
Merge branch 'master' into xsn/server_mtmd
ngxson Apr 14, 2025
c734b53
Merge branch 'master' into xsn/server_mtmd
ngxson Apr 21, 2025
a6a3653
small fix
ngxson Apr 21, 2025
f8bc466
refactor logic adding tokens to batch
ngxson Apr 21, 2025
f5420e1
implement hashing image
ngxson Apr 21, 2025
aae2e69
Merge branch 'master' into xsn/server_mtmd
ngxson Apr 23, 2025
cd11585
use FNV hash, now hash bitmap instead of file data
ngxson Apr 23, 2025
8afa952
allow decoding image embedding to be split into batches
ngxson Apr 23, 2025
989730c
rm whitespace
ngxson Apr 23, 2025
19b9fe1
Merge branch 'master' into xsn/server_mtmd
ngxson Apr 24, 2025
2df8c1a
disable some features when mtmd is on
ngxson Apr 24, 2025
b9ef895
fix --no-mmproj-offload
ngxson Apr 25, 2025
add9e21
mtmd_context_params no timings
ngxson Apr 25, 2025
0f39770
Merge branch 'master' into xsn/server_mtmd
ngxson Apr 25, 2025
58100b3
refactor server_inp to server_tokens
ngxson Apr 25, 2025
e82fea8
fix the failing test case
ngxson Apr 25, 2025
4a4f35c
init
ngxson Apr 29, 2025
f6b6517
wip
ngxson Apr 29, 2025
e0806c2
Merge branch 'master' into xsn/mtmd_c_api
ngxson Apr 29, 2025
82f4246
working version
ngxson Apr 29, 2025
f8c27b9
add mtmd::bitmaps
ngxson Apr 29, 2025
3357961
add test target
ngxson Apr 29, 2025
92d2404
rm redundant define
ngxson Apr 29, 2025
111d5af
test: mtmd_input_chunks_free
ngxson Apr 29, 2025
08d0f9c
rm outdated comment
ngxson Apr 29, 2025
a230804
Merge branch 'master' into xsn/mtmd_c_api
ngxson May 2, 2025
863db31
fix merging issue
ngxson May 2, 2025
a0fb701
explicitly create mtmd::input_chunks
ngxson May 2, 2025
6bc7a30
mtmd_input_chunk_copy
ngxson May 2, 2025
4d842eb
add clone()
ngxson May 2, 2025
f91fb97
Merge branch 'master' into xsn/server_mtmd
ngxson May 3, 2025
2cedd18
improve server_input struct
ngxson May 3, 2025
3ee071c
clip : fix confused naming ffn_up and ffn_down
ngxson May 3, 2025
3fbf0bd
rm ffn_i/o/g naming
ngxson May 3, 2025
f3870a6
rename n_embd, n_ff
ngxson May 3, 2025
ae83229
small fix
ngxson May 3, 2025
0009f76
Merge branch 'master' into xsn/clip_ffn_up_down_fix
ngxson May 3, 2025
246a4e0
no check n_ff
ngxson May 3, 2025
57b288f
Merge branch 'xsn/clip_ffn_up_down_fix' into xsn/server_mtmd
ngxson May 3, 2025
5f1fe1b
fix detokenize
ngxson May 3, 2025
06cb595
Merge branch 'master' into xsn/mtmd_c_api
ngxson May 4, 2025
e9f7ff9
add const to various places
ngxson May 4, 2025
049ae24
add warning about breaking changes
ngxson May 4, 2025
91613c0
Merge branch 'xsn/mtmd_c_api' into xsn/server_mtmd
ngxson May 4, 2025
d3fece5
add c api
ngxson May 4, 2025
076e3b9
helper: use mtmd_image_tokens_get_n_pos
ngxson May 4, 2025
574d403
Merge branch 'xsn/mtmd_c_api' into xsn/server_mtmd
ngxson May 4, 2025
036f682
Merge branch 'master' into xsn/server_mtmd
ngxson May 4, 2025
01c623e
fix ctx_shift
ngxson May 4, 2025
a0f2562
fix name shadowing
ngxson May 4, 2025
9149f39
Merge branch 'master' into xsn/server_mtmd
ngxson May 5, 2025
b353038
Merge branch 'master' into xsn/server_mtmd
ngxson May 6, 2025
3304b44
more strict condition
ngxson May 6, 2025
88461f2
support remote image_url
ngxson May 6, 2025
4adce86
Merge branch 'master' into xsn/server_mtmd
ngxson May 6, 2025
a9b21f4
remote image_url log
ngxson May 6, 2025
2f30530
add CI test
ngxson May 6, 2025
5ffde38
do not log base64
ngxson May 6, 2025
aaebc33
add "has_multimodal" to /props
ngxson May 8, 2025
eeda075
remove dangling image
ngxson May 8, 2025
bef122e
speculative: use slot.cache_tokens.insert
ngxson May 8, 2025
7282456
Merge branch 'master' into xsn/server_mtmd
ngxson May 8, 2025
51afc0a
Apply suggestions from code review
ngxson May 9, 2025
f10fc56
rm can_be_detokenized
ngxson May 9, 2025
689035c
on prmpt processing done, assert cache_tokens.size
ngxson May 9, 2025
b2906a9
handle_completions_impl returns void
ngxson May 9, 2025
abfd821
Merge branch 'master' into xsn/server_mtmd
ngxson May 9, 2025
f5fbc03
adapt the new web ui
ngxson May 9, 2025
5fe8d72
update docs and hot topics
ngxson May 9, 2025
b8000fd
rm assert
ngxson May 9, 2025
9ed430c
small fix (2)
ngxson May 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -833,9 +833,11 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context

// allow --mmproj to be set from -hf
// assuming that mmproj is always in the same repo as text model
if (!params.model.hf_repo.empty() && ctx_arg.ex == LLAMA_EXAMPLE_LLAVA) {
if (!params.model.hf_repo.empty() && (
ctx_arg.ex == LLAMA_EXAMPLE_LLAVA || ctx_arg.ex == LLAMA_EXAMPLE_SERVER)) {
params.mmproj.hf_repo = params.model.hf_repo;
}
// TODO @ngxson : this will break non-vision model with -hf, need to fix before merging
common_params_handle_model(params.mmproj, params.hf_token, "", true);

if (params.escape) {
Expand Down Expand Up @@ -2100,14 +2102,14 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
[](common_params & params, const std::string & value) {
params.mmproj.path = value;
}
).set_examples({LLAMA_EXAMPLE_LLAVA}));
).set_examples({LLAMA_EXAMPLE_LLAVA, LLAMA_EXAMPLE_SERVER}));
add_opt(common_arg(
{"--mmproj-url"}, "URL",
"URL to a multimodal projector file for LLaVA. see examples/llava/README.md",
[](common_params & params, const std::string & value) {
params.mmproj.url = value;
}
).set_examples({LLAMA_EXAMPLE_LLAVA}));
).set_examples({LLAMA_EXAMPLE_LLAVA, LLAMA_EXAMPLE_SERVER}));
add_opt(common_arg(
{"--image"}, "FILE",
"path to an image file. use with multimodal models. Specify multiple times for batching",
Expand Down
1 change: 1 addition & 0 deletions examples/llava/mtmd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ struct mtmd_context {
bool print_timings;
int n_threads;
std::string image_marker;
bool calc_image_hash;

// for minicpmv, we need special tokens in-between slices
mtmd_slice_tmpl slice_tmpl = MTMD_SLICE_TMPL_NONE;
Expand Down
1 change: 1 addition & 0 deletions examples/llava/mtmd.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ MTMD_API void mtmd_free(mtmd_context * ctx);
// 2. (image tokens)
// 3. "<end_of_image>\ndescribe it in detail."
// number of bitmaps must be equal to the number of image markers in the prompt
// the returned value must be freed using mtmd_input_chunks_free()
// this function is thread-safe (shared ctx)
// return values:
// 0 on success
Expand Down
4 changes: 3 additions & 1 deletion examples/server/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,10 @@ endforeach()
add_executable(${TARGET} ${TARGET_SRCS})
install(TARGETS ${TARGET} RUNTIME)

target_include_directories(${TARGET} PRIVATE ../llava)
target_include_directories(${TARGET} PRIVATE ../gguf-hash/deps/sha1) # TODO @ngxson : this is a hacky way to get this working, to be fixed before merging
target_include_directories(${TARGET} PRIVATE ${CMAKE_SOURCE_DIR})
target_link_libraries(${TARGET} PRIVATE common ${CMAKE_THREAD_LIBS_INIT})
target_link_libraries(${TARGET} PRIVATE common mtmd sha1 ${CMAKE_THREAD_LIBS_INIT})

if (LLAMA_SERVER_SSL)
find_package(OpenSSL REQUIRED)
Expand Down
Loading
Loading