gemma3 multimodal finetuning: vision config not being properly initialized #7529

imamitsingh · 2025-03-29T04:03:42Z

Issue Description

I'm trying to finetune Gemma3 (google/gemma-3-4b-it) for a multimodal task (processing both images and text)

During initialization, I'm seeing log messages:

[INFO|configuration_gemma3.py:306] 2025-03-28 19:56:22,190 >> text_config is None, using default Gemma3TextConfig vision config.
[INFO|configuration_gemma3.py:314] 2025-03-28 19:56:22,190 >> vision_config is None or incompatible with Gemma3VisionConfig intialization. Gemma3 will be limited to text tasks.

Question

How can I properly configure the vision component in llamafactory to ensure gemma3 is properly initialized for processing both image and text as input? Are there specific parameters I need to set in my training configuration?

Environment

llamafactory version: 0.9.3.dev0
transformers version: 4.50.0

Any guidance would be greatly appreciated!

Hojun-You · 2025-04-08T08:26:51Z

I encountered the same issue and traced the warning to its origin.

The warning is generated in lines 304–317 of transformers/models/gemma3/configuration_gemma3.py.

This is invoked by the method call config_class.from_dict(config_dict, **unused_kwargs) on line 1141 in transformers/models/auto/configuration_auto.py. This method constructs Gemma3Config from a dictionary.

The entire process starts when "config = load_config(model_args)" (L126) is executed in loader.py (located in llama-factory/model/loader.py). Even though initialization creates the warnings, the configuration object is later updated with values from config_dict, so it ultimately makes no functional issues.

imamitsingh added bug Something isn't working pending This problem is yet to be addressed labels Mar 29, 2025

imamitsingh changed the title ~~Gemma3 multimodal finetuning: vision config not being properly initialized~~ gemma3 multimodal finetuning: vision config not being properly initialized Mar 29, 2025

hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemma3 multimodal finetuning: vision config not being properly initialized #7529

gemma3 multimodal finetuning: vision config not being properly initialized #7529

imamitsingh commented Mar 29, 2025 •

edited

Loading

Hojun-You commented Apr 8, 2025

gemma3 multimodal finetuning: vision config not being properly initialized #7529

gemma3 multimodal finetuning: vision config not being properly initialized #7529

Comments

imamitsingh commented Mar 29, 2025 • edited Loading

Issue Description

Question

Environment

Hojun-You commented Apr 8, 2025

imamitsingh commented Mar 29, 2025 •

edited

Loading