CompCon

Official repository for "Discovering Divergent Representations in Text-to-Image Models"

Abstract

In this paper, we investigate when and how visual representations learned by two different generative models diverge. Given two text-to-image models, our goal is to discover visual attributes that appear in images generated by one model but not the other, along with the types of prompts that trigger these attribute differences. For example, flames might appear in one model's outputs when given prompts expressing strong emotions, while the other model does not produce this attribute given the same prompts.

We introduce CompCon (Comparing Concepts), an evolutionary search algorithm that discovers visual attributes more prevalent in one model's output than the other, and uncovers the prompt concepts linked to these visual differences. To evaluate CompCon's ability to find diverging representations, we create an automated data generation pipeline to produce ID², a dataset of 60 input-dependent differences, and compare our approach to several LLM- and VLM-powered baselines. Finally, we use CompCon to compare popular text-to-image models, finding divergent representations such as how PixArt depicts prompts mentioning loneliness with wet streets and Stable Diffusion 3.5 depicts African American people in media professions.

Quickstart

Note: This code is built off of the VisDiff repo.

Installation

git clone https://github.com/adobe-research/CompCon.git
cd CompCon
pip install -r requirements.txt

Set up embedding servers

The serve folder and follow setup instructions for setting up servers for locally hosted LLMs, VLMs, and embedding models.

Arguments

All arguments (model choices, types of attribute proposers, etc) can be found in configs/base.yaml

Run CompCon Pipeline

Full Pipeline (Recommended)

Run the complete CompCon pipeline to discover differences and generate prompt descriptions:

# Using config file
python main.py --config configs/base.yaml

# Using custom dataset (overrides config data settings)
python main.py --data_file data/custom_dataset/results.csv --models sd3.5-large playground

# Just discover differences
python main.py --config configs/benchmark/base.yaml --differences_only

# Override specific parameters
python main.py --config configs/base.yaml --overrides proposer.model=gpt-4o ranker.classify_threshold=0.3

Step-by-Step Pipeline

Step 1: Generate images for prompt set

python generate_images.py --prompts data/prompts.txt --num-imgs-per-prompt 3 --models sd3.5-large playground --name custom_dataset

Step 2: Get visual differences only

python get_visual_differences.py --config configs/base.yaml --overrides data.root=data/custom_dataset

Step 3: Generate prompt descriptions for specific attribute

python generate_prompt_descriptions.py --attribute "flames" --data_file data/custom_dataset/results.csv --models sd3.5-large playground --threshold 0.2 --delta 0.05 --num_iterations 3

ID² Benchmark

To systematically evaluate methods for discovering divergent representations, we created ID², a benchmark dataset containing 60 divergent representations between text-to-image models. Each representation consists of a diverging visual attribute and its corresponding diverging prompt description.

ID² creation. Given a diverging prompt description and diverging visual attribute, we use an LLM to generate prompt pairs where one of the prompts mentions the diverging visual attribute. Both prompts are then passed to the same text-to-image model to generate image pairs with the visual difference.

The full dataset (~30 GB) is available as a single tar file:

Direct download link

Command line:

# With curl
curl -O -C - https://compcon-data.s3.us-east-2.amazonaws.com/ID2.tar.gz

# Or with wget
wget -c https://compcon-data.s3.us-east-2.amazonaws.com/ID2.tar.gz

Run ID² Benchmark

CompCon

# Using config file
python benchmark_methods/compcon_benchmark.py --config "configs/benchmark/bats.yaml" --name "bats" --project CompCon-ID2

# Override config with custom data and models
python benchmark_methods/compcon_benchmark.py --config "configs/benchmark/bats.yaml" --data_file "data/custom/results.csv" --models model1 model2 --name "custom_run" --project CompCon-ID2

# Override specific parameters and models
python benchmark_methods/compcon_benchmark.py --name bats --data_file data/ID2_sample/bats.csv --proposer_model gpt-4o --evaluator_model gpt-4o --llm_model gpt-4o --threshold 0.2 --delta 0.05

LLM-only baseline

# Using config file  
python benchmark_methods/llm_only_baseline.py --name bats --data_file data/ID2_sample/bats.csv --project llm-only

# Override models used for method and evaluation
python benchmark_methods/llm_only_baseline.py --name bats --data_file data/ID2_sample/bats.csv --proposer_model gpt-4o --evaluator_model gpt-4o --project llm-only

TF-IDF baseline

# Using config file
python benchmark_methods/tfidf_baseline.py --name bats --data_file data/ID2_sample/bats.csv  --project tfidf-only

Model Override Arguments

All benchmark scripts support these model override arguments:

--proposer_model: Model used for generating hypotheses/associations (e.g., gpt-4o, gpt-4o-mini)
--evaluator_model: Model used for scoring and evaluating results (e.g., gpt-4o, gpt-4o-mini)
--llm_model: Model used for hypothesis refinement and deduplication in CompCon's iterative process

Usage Notes:

These arguments override the corresponding models specified in config files
proposer_model controls the core method logic (hypothesis generation in CompCon, image analysis in LLM-only)
evaluator_model controls final scoring and evaluation of discovered associations
llm_model controls CompCon's iterative refinement steps (evolving hypotheses, deduplicating similar ones)
If not specified, defaults from config files are used (typically gpt-4o for all)

Citation

@inproceedings{dunlap2025discovering,
  title     = {Discovering Divergent Representations between Text-to-Image Models},
  author    = {Dunlap, Lisa and Gonzalez, Joseph E. and Darrell, Trevor and Caba Heilbron, Fabian and Sivic, Josef and Russell, Bryan},
  booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
  year      = {2025},
  organization = {IEEE/CVF},
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
benchmark_methods		benchmark_methods
components		components
configs		configs
data		data
serve		serve
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
benchmark_visualization.ipynb		benchmark_visualization.ipynb
compcon_utils.py		compcon_utils.py
dataset_creation.png		dataset_creation.png
generate_images.py		generate_images.py
generate_prompt_descriptions.py		generate_prompt_descriptions.py
get_visual_differences.py		get_visual_differences.py
main.py		main.py
requirements.txt		requirements.txt
teaser_fig.png		teaser_fig.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CompCon

Abstract

Quickstart

Installation

Set up embedding servers

Arguments

Run CompCon Pipeline

Full Pipeline (Recommended)

Step-by-Step Pipeline

Step 1: Generate images for prompt set

Step 2: Get visual differences only

Step 3: Generate prompt descriptions for specific attribute

ID² Benchmark

Run ID² Benchmark

CompCon

LLM-only baseline

TF-IDF baseline

Model Override Arguments

Citation

About

Uh oh!

Releases

Packages

Languages

License

adobe-research/CompCon

Folders and files

Latest commit

History

Repository files navigation

CompCon

Abstract

Quickstart

Installation

Set up embedding servers

Arguments

Run CompCon Pipeline

Full Pipeline (Recommended)

Step-by-Step Pipeline

Step 1: Generate images for prompt set

Step 2: Get visual differences only

Step 3: Generate prompt descriptions for specific attribute

ID² Benchmark

Run ID² Benchmark

CompCon

LLM-only baseline

TF-IDF baseline

Model Override Arguments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages