Optionality of `attention_mask` argument in Attention classes/functions. #37046

Godofnothing · 2025-03-27T16:39:29Z

In the current stable version of transformers attention_mask argument is annotated as Optional[torch.Tensor] (see for example modeling_llama.py).

However, in fact it is a required argument.

At the same time, the ancestor LlamaDecoderLayer classes accepts this argument as Optional (see).

Delving deeper, flash_attention_forward annotates attention_mask as Optional[torch.Tensor] and calls inside _flash_attention_forward which takes attention_mask as required torch.Tensor argument, but there is conditional statement checking whether the attention_mask is not None and the function can be called in fact with attention_mask as None.

I suggest correcting the typing by making attention_mask an optional argument with None as its default value.

The text was updated successfully, but these errors were encountered:

Zephyr271828 · 2025-03-31T17:46:43Z

Hi! @Godofnothing, I just submiited a PR to fix this issue. Feel free to give me suggestions if there's any problem:)

Zephyr271828 mentioned this issue Mar 31, 2025

Fixes the inconsistency of the optionality of attention_mask #37153

Merged

5 tasks

Godofnothing closed this as completed Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionality of `attention_mask` argument in Attention classes/functions. #37046

Optionality of `attention_mask` argument in Attention classes/functions. #37046

Godofnothing commented Mar 27, 2025 •

edited

Loading

Zephyr271828 commented Mar 31, 2025 •

edited

Loading

Optionality of attention_mask argument in Attention classes/functions. #37046

Optionality of attention_mask argument in Attention classes/functions. #37046

Comments

Godofnothing commented Mar 27, 2025 • edited Loading

Zephyr271828 commented Mar 31, 2025 • edited Loading

Optionality of `attention_mask` argument in Attention classes/functions. #37046

Optionality of `attention_mask` argument in Attention classes/functions. #37046

Godofnothing commented Mar 27, 2025 •

edited

Loading

Zephyr271828 commented Mar 31, 2025 •

edited

Loading