GitHub - hasanar1f/PAINT: [CVPR 2025 Workshop] PAINT (Paying Attention to INformed Tokens) is a plug-and-play framework that intervenes in the self-attention of the LLM and selectively boost the visual attention informed tokens to mitigate hallucination of Vision Language Models

PAINT (Paying Attention to INformed Tokens)

Paper: PAINT: PAYING ATTENTION TO INFORMED TOKENS TO MITIGATE HALLUCINATION IN LARGE VISION-LANGUAGE MODEL

Abstract: Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities in understanding and describing visual content, achieving state-of-the-art performance across various vision-language tasks. However, these models often generate descriptions containing objects or details that are absent in the input image, a phenomenon commonly known as hallucination. Our work investigates the key reasons behind this issue by analyzing the attention patterns of tokens across transformer layers and heads. We find that hallucinations often arise from the progressive weakening of attention to visual tokens in the deeper layers of the LLM. Some previous works naively boost the attention of all visual tokens to mitigate this issue, resulting in suboptimal hallucination reduction. To address this, we identify two critical sets of visual tokens that facilitate the transfer of visual information from the vision encoder to the LLM. Local tokens encode grounded information about objects present in an image, while summary tokens capture the overall aggregated representation of the image. Importantly, these two sets of tokens require different levels of attention enhancement. To this end, we propose PAINT (Paying Attention to INformed Tokens), a plug-and-play framework that intervenes in the self-attention mechanism of the LLM, selectively boosting the attention of local and summary tokens with learned margins. Extensive experiments on the MSCOCO dataset demonstrate that our approach reduces hallucination rates by up to 62.3% compared to baseline models while maintaining strong task performance.

Installation and Setup

conda env create -f modPAI/environment.yml
conda activate modpai

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
images		images
lmms-eval		lmms-eval
modPAI		modPAI
modpai		modpai
pai		pai
results		results
transformers		transformers
~/nltk_data/tokenizers		~/nltk_data/tokenizers
.gitignore		.gitignore
README.md		README.md
chair.pkl		chair.pkl
installation.sh		installation.sh
modpai_eval.sh		modpai_eval.sh
paint_pipeline.jpg		paint_pipeline.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAINT (Paying Attention to INformed Tokens)

Installation and Setup

About

Releases

Packages

Contributors 2

Languages

hasanar1f/PAINT

Folders and files

Latest commit

History

Repository files navigation

PAINT (Paying Attention to INformed Tokens)

Installation and Setup

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages