-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Rope for Qwen2--5-vl #41173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rope for Qwen2--5-vl #41173
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen2_5_vl |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
gante
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code is not compile-compatible, but it should be fine -- if we provide position_ids, it is not reached 🤗
qwen2--5-vl
|
Hi @zucchini-nlp unfortunately this PR ( import torch
from transformers import AutoTokenizer, Qwen2_5_VLForConditionalGeneration
from peft import get_peft_model, PromptTuningConfig, PromptTuningInit, TaskType
max_new_tokens = 40
model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
text = "Discuss the most important work by Mary Shelley."
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(text, return_tensors="pt").to(0)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model_id, device_map=0)
prompt_tune_config = PromptTuningConfig(
task_type=TaskType.CAUSAL_LM,
prompt_tuning_init=PromptTuningInit.RANDOM,
num_virtual_tokens=5,
)
model = get_peft_model(model, prompt_tune_config)
model.prompt_encoder.default.embedding.weight.data.zero_() # make peft almost no-op
torch.manual_seed(0)
with torch.no_grad():
generated_ids = model.generate(
# seq len should be 5 virtual + 9 normal tokens
**inputs,
max_new_tokens=max_new_tokens,
# use_cache=False, # without cache, it works
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])The error is: The mismatching shapes stem from the key, which has seq len 29, whereas query and value have 15. With the previous commit, all of them have seq len 14, which is what is actually expected (5 virtual tokens + 9 normal input tokens). When disabling cache or with other models, like Llama or Qwen3 VL, this error does not occur. To give a short description, in prompt tuning, we create some extra embeddings, concat them with the The previous PR, |
|
Interesting since this block was copied from Qwen2-VL to have consistency 🤔 (and hoping it will fix the linked issue) |
|
I got a very similar error with |
|
I will take a look after v5-rc0. If it fails on qwen-vl, then it has been there for a looong time haha |
Thanks. I'd say it's not super urgent, it can wait for after v5. |
|
@BenjaminBossan seems like |
|
Thanks for investigating @zucchini-nlp. I investigated why that happens and found this comment:
The original PR that introduced it was huggingface/peft#1484. So IIUC, for these Qwen VL models, the comment is not true that the
As to how to adjust the is_prefill = (model_kwargs.get("cache_position") is not None) and (model_kwargs["cache_position"][0] == 0)
...
if is_prefill:
# virtual tokens are prepended to the inputs_embeds, so extend the cache position
new_seq_len = model_kwargs['inputs_embeds'].shape[1]
model_kwargs["cache_position"] = torch.arange(new_seq_len).to(dtype=model_kwargs["cache_position"].dtype, device=model_kwargs["cache_position"].device)
else:
# leave model_kwargs["cache_position"] as is |
|
Ah I see, that makes sense now. In that case the best would be to fix it on transformers-side and adjust the postion id preparation step. Instead of assuming that |
|
Thanks so much @zucchini-nlp |
What does this PR do?
Attempt to fix #41093. I believe that the
is_prefill()logic had edge cases which were caught in the linked issue. Let's remove it since the position ids are also prepared inprepare_inputs_for_generation.