fix: loss computation after embeddings resize - mllama #36840

Ssukriti · 2025-03-19T23:14:59Z

What does this PR do?

background: #36591

Fixes loss computation when vocab is resized by resize_embeddings. Only vocab size of parent conditionalgenerationclass is modified, hence loss has to be calculated there. Chosen approach until bigger refactor. Solution discussed with @zucchini-nlp in above PR

Fixes # (issue)
#36590

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zucchini-nlp as discussed solution

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti · 2025-03-19T23:20:29Z

src/transformers/models/mllama/modeling_mllama.py

+            return output

        return CausalLMOutputWithPast(
-            loss=loss,


I still left rest of return types in the class , for minimal changes as you mentioned there would be a massive refactor @zucchini-nlp . So I just moved loss computation to ConditionalGenerationClass . However with that change, there may not be a need for this MllamaForCausalLM class at all, as basically its just adding logits, which can also be moved to ConditionalGenerationClass and just the MllamaText Model class can be used directly from ConditionalGenerationClass

Will leave that to you as you think of the refactor. Or if you want me to clean it up, can do

same here, to leave as is for users who load only the LLM part

I will add it back for users that use this class directly, but will not pass labels from GenerationClass to avoid exception

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

src/transformers/models/mllama/modeling_mllama.py

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti · 2025-03-20T01:00:07Z

tests/models/mllama/test_modeling_mllama.py

                out_embeds = model(inputs_embeds=inputs_embeds, **inputs)[0]
            torch.testing.assert_close(out_embeds, out_ids)

+    def test_resize_embeddings_results_in_successful_loss(self):


test for reported bug that would fail earlier and now pass

zucchini-nlp

Great, thanks a lot! This looks better to me as a temporary workaround until refactoring. I left some comments about removing loss from causalLM completely, but otherwise LGTM

src/transformers/models/mllama/modeling_mllama.py

zucchini-nlp · 2025-03-20T09:15:27Z

src/transformers/models/mllama/modeling_mllama.py

+            return output

        return CausalLMOutputWithPast(
-            loss=loss,


same here, to leave as is for users who load only the LLM part

tests/models/mllama/test_modeling_mllama.py

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti · 2025-03-20T17:25:27Z

@zucchini-nlp all tests have passed and comments addressed . Thank you for the review.

zucchini-nlp

Perfect, thanks!

Ssukriti · 2025-03-21T12:11:07Z

Thank you @zucchini-nlp . Can the PR be merged soon as well ? It is actually blocking a use case we have

zucchini-nlp · 2025-03-21T13:47:53Z

Yep, merging, sorry

…6840) * move loss to generation class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * code cleanup Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * test for resize and loss computation Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix:test for resize and loss Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix resize embedding mllama test Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * review changes Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti added 2 commits March 19, 2025 16:24

move loss to generation class

985fe75

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

code cleanup

98b8b29

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti mentioned this pull request Mar 19, 2025

fix: update vocab size of language model's config on resize - mllama #36591

Closed

5 tasks

Ssukriti commented Mar 19, 2025

View reviewed changes

Ssukriti changed the title ~~fix loss computation after embeddings resize~~ fix: loss computation after embeddings resize - mllama Mar 19, 2025

test for resize and loss computation

dd02bfd

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti marked this pull request as ready for review March 19, 2025 23:47

Ssukriti added 2 commits March 19, 2025 18:44

fix tests

aef4f51

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix:test for resize and loss

c7b9adb

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti commented Mar 20, 2025

View reviewed changes

src/transformers/models/mllama/modeling_mllama.py Show resolved Hide resolved

fix resize embedding mllama test

75714c6

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti commented Mar 20, 2025

View reviewed changes

zucchini-nlp reviewed Mar 20, 2025

View reviewed changes

review changes

1d86452

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

zucchini-nlp approved these changes Mar 21, 2025

View reviewed changes

zucchini-nlp merged commit 90e2df5 into huggingface:main Mar 21, 2025
12 checks passed

Ssukriti deleted the test_fix_loss_computation branch April 10, 2025 23:43

fix: loss computation after embeddings resize - mllama #36840

fix: loss computation after embeddings resize - mllama #36840

Uh oh!

Conversation

Ssukriti commented Mar 19, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Ssukriti Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Ssukriti Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ssukriti Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ssukriti commented Mar 20, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Ssukriti commented Mar 21, 2025

Uh oh!

zucchini-nlp commented Mar 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ssukriti Mar 19, 2025 •

edited

Loading