Skip to content

Conversation

@Ssukriti
Copy link
Contributor

@Ssukriti Ssukriti commented Mar 6, 2025

What does this PR do?

The PR updates the config of the mllama language model to update its vocab size. This is useful when vocab size of MllamaForConditionalGeneration class is updated when resizing embeddings using model.resize_token_embeddings(embedding_size) . More details included in the issue below.

The PR includes one potential fix. There are several ways to do this. Suggestions welcome!

Fixes # (issue)
#36590

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Models:

@Ssukriti Ssukriti marked this pull request as ready for review March 14, 2025 20:36
@Ssukriti
Copy link
Contributor Author

@zucchini-nlp the Pr is ready for review changes have been made as discussed earlier (in resolved conversation above).

  1. all resize embeddings tests are passing, which indicates no change in current behavior
  2. added new unit tests for the function set_vocab_size
  3. mllama bug Error when changing vocab size when fine tuning llama-vision #36590 is fixed. The llava class does not use vocab size in loss, hence the bug did not impact it. On quick look, I didnt find other applicable multimodel classes that could be impacted either. Hence, fix is only for mllama in this PR

@Ssukriti Ssukriti changed the title update config for vocab size of langauge model mllama fix: update vocab size of language model's config on resize - mllama Mar 17, 2025
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry it took long to come back!

After giving it more thought, I think the easiest solution is to stop passing labels to the language model and calculate loss from within ConditionalGeneration class. Something like

logits = outputs.logits
loss = None
if labels is not None:
# Upcast to float if we need to compute the loss to avoid potential precision issues
logits = logits.float()
shift_logits = logits[..., :-1, :]
shift_labels = labels[..., 1:]
if attention_mask is not None:
# we use the input attention mask to shift the logits and labels, because it is 2D.
# we also crop attn mask in case it is longer, which happens in PrefixTuning with peft
shift_attention_mask = attention_mask[:, -shift_logits.shape[1] :].to(logits.device)
shift_logits = shift_logits[shift_attention_mask.to(logits.device) != 0].contiguous()
shift_labels = shift_labels[shift_attention_mask.to(shift_labels.device) != 0].contiguous()
else:
shift_logits = shift_logits.contiguous()
shift_labels = shift_labels.contiguous()
# Flatten the tokens
loss_fct = nn.CrossEntropyLoss()
, but with self.loss_fn

I realize this is not a perfect solution and doesn't scale, but we are planning big refactor on multimodal models, and the vocab size issue for labels will be resolved by then. Therefore I don't want us to add more abstraction which might be deleted later on

Comment on lines 333 to 338

new_vocab_size = config.get_text_config().vocab_size + 10
model.set_vocab_size(new_vocab_size)
self.assertEqual(model.language_model.get_vocab_size(), new_vocab_size)
# model's get vocab returns language model's vocab size
self.assertEqual(model.language_model.get_vocab_size(), model.get_vocab_size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be aligned with the linked issue, let's also test that model can take labels and return loss after resizing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I added the test in follow up PR #36840

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added you as a collaborator to my fork if needed. Invite is pending

@Ssukriti
Copy link
Contributor Author

@zucchini-nlp sure we can go with that approach instead . I created a new PR #36840 as had to undo the changes here.

If the solution looks ok, can continue with unit tests , thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants