-
Notifications
You must be signed in to change notification settings - Fork 14.1k
llama : add support for NVIDIA Nemotron 3 Nano #18058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model.
|
GGUFs work great! I converted them via the PR at https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF Note |
|
@danielhanchen please wait until it's merged next time. |
We were launch partners with Nvidia and so we supported finetuning out of the gate and announced the GGUFs with it. The GGUFs also already work in LMStudio as this llama.cpp PR was merged. I will need to change the instructions for llama.cpp in our guide. |
|
It's a very good model, happy to see it supported so quickly on release. |
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have some issue with parsing the reasoning tokens:
Hoping that we'll get support from the community to fix this. Pinging @aldehir
Let's merge after the CI is green.
Does it have something to do with this, feom unsloth: Nemotron 3 chat template format: |
|
Even with |
I will take a look. |
|
LMStudio seems to support it and support the thinking, so there has to be some way to make it work, but yeah, during normal generation on server WebUI I didn't see the closing thinking tag (the opening one I assume is appended to the generation prompt, so it's a typical |
CISC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits only, not required to apply any.
|
@pwilkin Actually I re-checked you're correct - But yes |
|
I'm thinking that before we fix the reasoning parsing, there is no point to merge this PR. So let's put it on hold until we figure it out. |
|
@ggerganov I have the changes ready, although they support reasoning + tool calling so they're not small. How would you like me to proceed? I can provide a subset of the changes to only address the reasoning and then add the rest in an another PR. |
|
The models are now available on Huggingface. |
This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model.
Tech blog: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate/