Multimodal support in llama-server
for Gemma 3
#12885
andportnoy
started this conversation in
General
Replies: 1 comment 1 reply
-
Bringing mtmd to server is easy, but the problem is to manage the KV cache with non-text tokens across requests. Currently we are using common prefix algorithm to determine how many tokens in KV can be reused, but obviously doing common prefix on image is not that simple. I will have a look into it today. Will open a PR so everyone can discuss |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@ngxson Thank you so much for continuing to push multimodal support forward with PRs such as #12849. Is support in
llama-server
on your roadmap, in particular for models like Gemma 3? What would the implementation of it involve, given the new libmtmd library? Thank you again for your work.Beta Was this translation helpful? Give feedback.
All reactions