-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Description
System Info
Code:
from transformers import AutoTokenizer, AutoModelForImageTextToText, AutoProcessor
import torch
base_model = "mistralai/Ministral-3-8B-Instruct-2512-BF16"
model = AutoModelForImageTextToText.from_pretrained(base_model, dtype=torch.bfloat16)
model = model.to("cuda:1")
tokenizer = AutoProcessor.from_pretrained(base_model)
user_prompt = "hello how are you?"
messages = [
{"role": "user", "content": user_prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=50, do_sample=False)
decoded_output = tokenizer.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Output:
Hello!ĠðŁĺĬĠI'mĠjustĠaĠvirtualĠassistant,ĠsoĠIĠdon'tĠhaveĠfeelings,ĠbutĠI'mĠhereĠandĠreadyĠtoĠhelpĠyouĠwithĠanythingĠyouĠneed!ĠHowĠaboutĠyouâĢĶhowĠareĠyouĠdoingĠtoday?ĠAnythingĠfunĠorĠinterestingĠon
Environments:
Python 3.12.7
transformers 5.0.0.dev0 (installed from main branch)
torch: 2.9.0
mistral_common: 1.8.6
The same code with MinistralCommonBackend loaded tokenizer works:
Code:
import torch
from transformers import AutoModelForImageTextToText, MistralCommonBackend
tokenizer = MistralCommonBackend.from_pretrained(base_model)
model = AutoModelForImageTextToText.from_pretrained(
base_model, torch_dtype=torch.bfloat16
)
model = model.to("cuda:2")
user_prompt = "hello how are you?"
messages = [
{"role": "user", "content": user_prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=50, do_sample=False)
decoded_output = tokenizer.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Output:
Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you with anything you need. How about you? How are you doing today?[😊]
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import AutoTokenizer, AutoModelForImageTextToText, AutoProcessor
import torch
base_model = "mistralai/Ministral-3-8B-Instruct-2512-BF16"
model = AutoModelForImageTextToText.from_pretrained(base_model, dtype=torch.bfloat16)
model = model.to("cuda:1")
tokenizer = AutoProcessor.from_pretrained(base_model)
user_prompt = "hello how are you?"
messages = [
{"role": "user", "content": user_prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=50, do_sample=False)
decoded_output = tokenizer.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Expected behavior
Clean output with BPE markers handled properly