Fixes Flash Attention implementation for models #42149

i3hz · 2025-11-11T14:03:33Z

What does this PR do?

Fixes the issue where models use an outdated if self.config._attn_implementation != "flash_attention_2": check.

Models changed - SmolVLM, idefics3 , idefics2

Fixes #42121

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zucchini-nlp @vasqu

vasqu

I'd like to fix idefics2 properly or leave it out for now, the others lgtm. Will check with run-slow a bit later

vasqu · 2025-11-11T14:59:04Z

src/transformers/models/idefics2/modeling_idefics2.py

        # The call to `_upad_input` in `_flash_attention_forward` is expensive
        # So when the `patch_attention_mask` is full of 1s (i.e. attending to the whole sequence),
        # avoiding passing the attention_mask, which is equivalent to attending to the full sequence


Let's remove these comments too

vasqu · 2025-11-11T15:01:27Z

src/transformers/models/idefics2/modeling_idefics2.py

 from ...cache_utils import Cache, DynamicCache
 from ...generation import GenerationMixin
+from ...masking_utils import create_bidirectional_mask
 from ...modeling_attn_mask_utils import _prepare_4d_attention_mask


Seems like _prepare_4d_attention_mask is still used and it will likely cause similar issues at other points, best to completely remove this usage elsewhere too!

See

transformers/src/transformers/models/idefics2/modeling_idefics2.py

Lines 746 to 750 in 3ff0e69

attention_mask = (

_prepare_4d_attention_mask(attention_mask, latents.dtype, tgt_len=self.n_latents)

if self.config._attn_implementation != "flash_attention_2"

else attention_mask

)

(looks like the flag for that model there also needs to be updated to _supports_flash_attn)

so do I just reset all the changes from idefics2?

Yea either that or use the create_bidirectional_mask fn here as well if it works; if not, I also appreciate that. Means I need to take a look here :D

alright I can't run the testing script as I don't have enough vram but is this the correct approach?

i3hz · 2025-11-12T06:44:31Z

tests seem to be failing so I don't think its the correct one

github-actions · 2025-11-12T08:26:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: idefics3, smolvlm

zucchini-nlp

LGTM! I think there were more models which still check for flash_attention_2, would be nice to batch update all. It can totally go in a separate PR later :)

zucchini-nlp · 2025-11-12T09:37:53Z

run-slow: idefics3, smolvlm

github-actions · 2025-11-12T09:39:10Z

This comment contains run-slow, running the specified jobs:

models: ["models/idefics3", "models/smolvlm"]
quantizations: []

HuggingFaceDocBuilderDev · 2025-11-12T09:44:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-11-12T09:48:07Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

vasqu

Thank you, let's keep it simple for now. We can address more models in separate PRs

vasqu · 2025-11-12T12:33:09Z

tests seem to be failing so I don't think its the correct one

Hmm it might need a dummy embedding of the correct size along the latents. Would leave this for a different PR

vasqu · 2025-11-12T12:33:54Z

Also thx for all the PR 🤗

i3hz · 2025-11-12T16:22:00Z

LGTM! I think there were more models which still check for flash_attention_2, would be nice to batch update all. It can totally go in a separate PR later :)

Yeah I didn't want to make a whole lot of changes in a single PR . I find it very confusing , sorry if that wasn't the ideal choice

i3hz added 2 commits November 11, 2025 13:24

flash-att3 fix for smolvlm2

3fa0836

flash-att3 fix for idefics2

5ab8a47

vasqu reviewed Nov 11, 2025

View reviewed changes

idefics2 changes

86dadce

reset idefics2

f7abe0e

zucchini-nlp approved these changes Nov 12, 2025

View reviewed changes

vasqu approved these changes Nov 12, 2025

View reviewed changes

vasqu merged commit fcea1e1 into huggingface:main Nov 12, 2025
18 checks passed

i3hz deleted the flash3 branch November 13, 2025 03:05

	attention_mask = (
	_prepare_4d_attention_mask(attention_mask, latents.dtype, tgt_len=self.n_latents)
	if self.config._attn_implementation != "flash_attention_2"
	else attention_mask
	)

Fixes Flash Attention implementation for models #42149

Fixes Flash Attention implementation for models #42149

Uh oh!

Conversation

i3hz commented Nov 11, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

i3hz Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

i3hz Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

i3hz commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

CI Results

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Nov 12, 2025

Uh oh!

Uh oh!

vasqu commented Nov 12, 2025

Uh oh!

i3hz commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vasqu Nov 11, 2025 •

edited

Loading

i3hz commented Nov 12, 2025 •

edited

Loading