Official repository for "Discovering Divergent Representations in Text-to-Image Models"
In this paper, we investigate when and how visual representations learned by two different generative models diverge. Given two text-to-image models, our goal is to discover visual attributes that appear in images generated by one model but not the other, along with the types of prompts that trigger these attribute differences. For example,
flames
might appear in one model's outputs when given prompts expressing strong emotions, while the other model does not produce this attribute given the same prompts.We introduce CompCon (Comparing Concepts), an evolutionary search algorithm that discovers visual attributes more prevalent in one model's output than the other, and uncovers the prompt concepts linked to these visual differences. To evaluate CompCon's ability to find diverging representations, we create an automated data generation pipeline to produce ID², a dataset of 60 input-dependent differences, and compare our approach to several LLM- and VLM-powered baselines. Finally, we use CompCon to compare popular text-to-image models, finding divergent representations such as how PixArt depicts prompts mentioning loneliness with wet streets and Stable Diffusion 3.5 depicts African American people in media professions.
Note: This code is built off of the VisDiff repo.
git clone https://github.com/adobe-research/CompCon.git
cd CompCon
pip install -r requirements.txt
The serve folder and follow setup instructions for setting up servers for locally hosted LLMs, VLMs, and embedding models.
All arguments (model choices, types of attribute proposers, etc) can be found in configs/base.yaml
Run the complete CompCon pipeline to discover differences and generate prompt descriptions:
# Using config file
python main.py --config configs/base.yaml
# Using custom dataset (overrides config data settings)
python main.py --data_file data/custom_dataset/results.csv --models sd3.5-large playground
# Just discover differences
python main.py --config configs/benchmark/base.yaml --differences_only
# Override specific parameters
python main.py --config configs/base.yaml --overrides proposer.model=gpt-4o ranker.classify_threshold=0.3
python generate_images.py --prompts data/prompts.txt --num-imgs-per-prompt 3 --models sd3.5-large playground --name custom_dataset
python get_visual_differences.py --config configs/base.yaml --overrides data.root=data/custom_dataset
python generate_prompt_descriptions.py --attribute "flames" --data_file data/custom_dataset/results.csv --models sd3.5-large playground --threshold 0.2 --delta 0.05 --num_iterations 3
The full dataset (~30 GB) is available as a single tar file:
Command line:
# With curl
curl -O -C - https://compcon-data.s3.us-east-2.amazonaws.com/ID2.tar.gz
# Or with wget
wget -c https://compcon-data.s3.us-east-2.amazonaws.com/ID2.tar.gz
# Using config file
python benchmark_methods/compcon_benchmark.py --config "configs/benchmark/bats.yaml" --name "bats" --project CompCon-ID2
# Override config with custom data and models
python benchmark_methods/compcon_benchmark.py --config "configs/benchmark/bats.yaml" --data_file "data/custom/results.csv" --models model1 model2 --name "custom_run" --project CompCon-ID2
# Override specific parameters and models
python benchmark_methods/compcon_benchmark.py --name bats --data_file data/ID2_sample/bats.csv --proposer_model gpt-4o --evaluator_model gpt-4o --llm_model gpt-4o --threshold 0.2 --delta 0.05
# Using config file
python benchmark_methods/llm_only_baseline.py --name bats --data_file data/ID2_sample/bats.csv --project llm-only
# Override models used for method and evaluation
python benchmark_methods/llm_only_baseline.py --name bats --data_file data/ID2_sample/bats.csv --proposer_model gpt-4o --evaluator_model gpt-4o --project llm-only
# Using config file
python benchmark_methods/tfidf_baseline.py --name bats --data_file data/ID2_sample/bats.csv --project tfidf-only
All benchmark scripts support these model override arguments:
--proposer_model
: Model used for generating hypotheses/associations (e.g.,gpt-4o
,gpt-4o-mini
)--evaluator_model
: Model used for scoring and evaluating results (e.g.,gpt-4o
,gpt-4o-mini
)--llm_model
: Model used for hypothesis refinement and deduplication in CompCon's iterative process
Usage Notes:
- These arguments override the corresponding models specified in config files
proposer_model
controls the core method logic (hypothesis generation in CompCon, image analysis in LLM-only)evaluator_model
controls final scoring and evaluation of discovered associationsllm_model
controls CompCon's iterative refinement steps (evolving hypotheses, deduplicating similar ones)- If not specified, defaults from config files are used (typically
gpt-4o
for all)
@inproceedings{dunlap2025discovering,
title = {Discovering Divergent Representations between Text-to-Image Models},
author = {Dunlap, Lisa and Gonzalez, Joseph E. and Darrell, Trevor and Caba Heilbron, Fabian and Sivic, Josef and Russell, Bryan},
booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
year = {2025},
organization = {IEEE/CVF},
}