Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemma3 multimodal finetuning: vision config not being properly initialized #7529

Open
imamitsingh opened this issue Mar 29, 2025 · 1 comment
Labels
solved This problem has been already solved

Comments

@imamitsingh
Copy link

imamitsingh commented Mar 29, 2025

Issue Description

I'm trying to finetune Gemma3 (google/gemma-3-4b-it) for a multimodal task (processing both images and text)

During initialization, I'm seeing log messages:

[INFO|configuration_gemma3.py:306] 2025-03-28 19:56:22,190 >> text_config is None, using default Gemma3TextConfig vision config.
[INFO|configuration_gemma3.py:314] 2025-03-28 19:56:22,190 >> vision_config is None or incompatible with Gemma3VisionConfig intialization. Gemma3 will be limited to text tasks.

Question

How can I properly configure the vision component in llamafactory to ensure gemma3 is properly initialized for processing both image and text as input? Are there specific parameters I need to set in my training configuration?

Environment

  • llamafactory version: 0.9.3.dev0
  • transformers version: 4.50.0

Any guidance would be greatly appreciated!

@imamitsingh imamitsingh added bug Something isn't working pending This problem is yet to be addressed labels Mar 29, 2025
@imamitsingh imamitsingh changed the title Gemma3 multimodal finetuning: vision config not being properly initialized gemma3 multimodal finetuning: vision config not being properly initialized Mar 29, 2025
@Hojun-You
Copy link

I encountered the same issue and traced the warning to its origin.

The warning is generated in lines 304–317 of transformers/models/gemma3/configuration_gemma3.py.

This is invoked by the method call config_class.from_dict(config_dict, **unused_kwargs) on line 1141 in transformers/models/auto/configuration_auto.py. This method constructs Gemma3Config from a dictionary.

The entire process starts when "config = load_config(model_args)" (L126) is executed in loader.py (located in llama-factory/model/loader.py). Even though initialization creates the warnings, the configuration object is later updated with values from config_dict, so it ultimately makes no functional issues.

@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants