Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionality of attention_mask argument in Attention classes/functions. #37046

Closed
Godofnothing opened this issue Mar 27, 2025 · 1 comment
Closed

Comments

@Godofnothing
Copy link

Godofnothing commented Mar 27, 2025

In the current stable version of transformers attention_mask argument is annotated as Optional[torch.Tensor] (see for example modeling_llama.py).

However, in fact it is a required argument.

At the same time, the ancestor LlamaDecoderLayer classes accepts this argument as Optional (see).

Delving deeper, flash_attention_forward annotates attention_mask as Optional[torch.Tensor] and calls inside _flash_attention_forward which takes attention_mask as required torch.Tensor argument, but there is conditional statement checking whether the attention_mask is not None and the function can be called in fact with attention_mask as None.

I suggest correcting the typing by making attention_mask an optional argument with None as its default value.

@Zephyr271828
Copy link
Contributor

Zephyr271828 commented Mar 31, 2025

Hi! @Godofnothing, I just submiited a PR to fix this issue. Feel free to give me suggestions if there's any problem:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants