Gemma3 fixes #41572

remi-or · 2025-10-14T11:02:36Z

This PR fixes three things in gemma3:

a multiple-device error where torch.where takes some of its coefficients from a tensor that is not on the right device and is a full_like, so we just replace it with the filling element
an error in the flash_attn_inference_equivalence which is due to the model needing more parameters than are generated by defualt. To avoid this, we add a flag that specifies if we need to check the forward pass with training or not, and make this check default for both and left padding (cc. @vasqu )
the test flash_attn_from_config was failing for the same reasons (token_type_ids is required as a model input when training) so I added a .eval() to avoid this. It does not seem the model needs to be in train mode for this test, but I can also add an option to the test to only call .eval() if a flag is passed

HuggingFaceDocBuilderDev · 2025-10-14T11:12:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu

Overall happy with the changes, imo we should move the training mode check to the config test instead + check with run-slow on our CIs just to be sure

vasqu · 2025-10-14T12:17:53Z

tests/test_modeling_common.py

+                # Check it can run in training mode
+                if check_forward_in_train:
+                    model.train()
+                    _ = model(**second_inputs)


Would it make more sense on the flash_attn_from_config test (with the new kwarg and making it a default to check for training)? This is still a bit weird to be in this test after another thought --> we only want to check inference equivalence tbh, the config test is more general and checks whether things break.

Seems like a good idea -- that test is already doing a fwd in train mode. Changing it.

vasqu · 2025-10-14T14:37:13Z

tests/test_modeling_common.py

                config, attn_implementation=attn_implementation, dtype=torch.bfloat16
            ).to(torch_device)
+            if test_fwd_in_train:
+                fa_model = fa_model.train()


Can we add a small comment here to clarify that it is indeed different, e.g. dropout? Otherwise, lgtm

Actually, can we add another explaining that we set train mode because it's strictly harder than eval? I.e. if it works in train, it works in eval but not necessarily the other way around. Just because it's not obvious why we would set train mode here by default otherwise
Sorry for being annoying but did not get it from first glance

vasqu · 2025-10-14T14:41:06Z

run-slow: gemma3

github-actions · 2025-10-14T14:42:34Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3']
quantizations: [] ...

vasqu · 2025-10-14T15:08:16Z

Even better than main CI ❤️ feel free to merge after adding a small comment to why train vs eval

Cyrilvallez · 2025-10-14T16:15:00Z

tests/test_modeling_common.py

-                    # Check it can run in training mode
-                    model.train()
-                    _ = model(**second_inputs)


Indeed from the name of the test it does not seem necesary

Cyrilvallez · 2025-10-14T16:21:05Z

tests/test_modeling_common.py

                config, attn_implementation=attn_implementation, dtype=torch.bfloat16
            ).to(torch_device)
+            if test_fwd_in_train:
+                fa_model = fa_model.train()


Actually, can we add another explaining that we set train mode because it's strictly harder than eval? I.e. if it works in train, it works in eval but not necessarily the other way around. Just because it's not obvious why we would set train mode here by default otherwise
Sorry for being annoying but did not get it from first glance

github-actions · 2025-10-14T16:30:26Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3

* Multiple device error fix * FA2 equivalence fix * Move the train fwd in cfg test * Style * Added comment * Made the comment more clear

remi-or added 2 commits October 14, 2025 10:54

Multiple device error fix

be66562

FA2 equivalence fix

716bb83

remi-or requested a review from vasqu October 14, 2025 11:02

vasqu approved these changes Oct 14, 2025

View reviewed changes

remi-or added 2 commits October 14, 2025 14:03

Move the train fwd in cfg test

ed7edad

Style

f300328

vasqu reviewed Oct 14, 2025

View reviewed changes

remi-or and others added 2 commits October 14, 2025 15:45

Added comment

cd34e2b

Merge branch 'main' into gemma3-fixes

9085da0

Cyrilvallez approved these changes Oct 14, 2025

View reviewed changes

Made the comment more clear

673cecc

Cyrilvallez merged commit 9e4199e into huggingface:main Oct 14, 2025
14 of 25 checks passed

i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 15, 2025

Gemma3 fixes (huggingface#41572)

4128cc0

* Multiple device error fix * FA2 equivalence fix * Move the train fwd in cfg test * Style * Added comment * Made the comment more clear

ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025

Gemma3 fixes (huggingface#41572)

8d376ca

* Multiple device error fix * FA2 equivalence fix * Move the train fwd in cfg test * Style * Added comment * Made the comment more clear

Gemma3 fixes #41572

Gemma3 fixes #41572

Uh oh!

Conversation

remi-or commented Oct 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 14, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

remi-or Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

vasqu commented Oct 14, 2025

Uh oh!

Cyrilvallez Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants