You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to finetune Gemma3 (google/gemma-3-4b-it) for a multimodal task (processing both images and text)
During initialization, I'm seeing log messages:
[INFO|configuration_gemma3.py:306] 2025-03-28 19:56:22,190 >> text_config is None, using default Gemma3TextConfig vision config.
[INFO|configuration_gemma3.py:314] 2025-03-28 19:56:22,190 >> vision_config is None or incompatible with Gemma3VisionConfig intialization. Gemma3 will be limited to text tasks.
Question
How can I properly configure the vision component in llamafactory to ensure gemma3 is properly initialized for processing both image and text as input? Are there specific parameters I need to set in my training configuration?
Environment
llamafactory version: 0.9.3.dev0
transformers version: 4.50.0
Any guidance would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
imamitsingh
changed the title
Gemma3 multimodal finetuning: vision config not being properly initialized
gemma3 multimodal finetuning: vision config not being properly initialized
Mar 29, 2025
I encountered the same issue and traced the warning to its origin.
The warning is generated in lines 304–317 of transformers/models/gemma3/configuration_gemma3.py.
This is invoked by the method call config_class.from_dict(config_dict, **unused_kwargs) on line 1141 in transformers/models/auto/configuration_auto.py. This method constructs Gemma3Config from a dictionary.
The entire process starts when "config = load_config(model_args)" (L126) is executed in loader.py (located in llama-factory/model/loader.py). Even though initialization creates the warnings, the configuration object is later updated with values from config_dict, so it ultimately makes no functional issues.
hiyouga
added
solved
This problem has been already solved
and removed
bug
Something isn't working
pending
This problem is yet to be addressed
labels
Apr 8, 2025
Issue Description
I'm trying to finetune Gemma3 (google/gemma-3-4b-it) for a multimodal task (processing both images and text)
During initialization, I'm seeing log messages:
[INFO|configuration_gemma3.py:306] 2025-03-28 19:56:22,190 >> text_config is None, using default Gemma3TextConfig vision config.
[INFO|configuration_gemma3.py:314] 2025-03-28 19:56:22,190 >> vision_config is None or incompatible with Gemma3VisionConfig intialization. Gemma3 will be limited to text tasks.
Question
How can I properly configure the vision component in llamafactory to ensure gemma3 is properly initialized for processing both image and text as input? Are there specific parameters I need to set in my training configuration?
Environment
Any guidance would be greatly appreciated!
The text was updated successfully, but these errors were encountered: