llama : add support for NVIDIA Nemotron 3 Nano #18058

danbev · 2025-12-15T14:15:01Z

This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model.

Copy of genai-social-nemotron-3-4643900-1920x1080

Tech blog: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate/

This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model.

danielhanchen · 2025-12-15T14:17:32Z

GGUFs work great! I converted them via the PR at https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF

Note <think> and </think> are separate tokens, so folks might need to use --special if needed.

src/llama-graph.cpp

src/models/nemotron-h.cpp

arch-btw · 2025-12-15T14:36:31Z

@danielhanchen please wait until it's merged next time.

danielhanchen · 2025-12-15T15:17:57Z

@danielhanchen please wait until it's merged next time.

We were launch partners with Nvidia and so we supported finetuning out of the gate and announced the GGUFs with it.

The GGUFs also already work in LMStudio as this llama.cpp PR was merged.

I will need to change the instructions for llama.cpp in our guide.

src/llama-graph.cpp

cmp-nct · 2025-12-15T16:18:08Z

It's a very good model, happy to see it supported so quickly on release.
Sadly it's very bad in detailed replicating, so visual puzzles are not going to work with it but that's a model architectural flaw I guess - same on openrouter

ggerganov

I think we have some issue with parsing the reasoning tokens:

Hoping that we'll get support from the community to fix this. Pinging @aldehir

Let's merge after the CI is green.

engrtipusultan · 2025-12-15T17:55:32Z

I think we have some issue with parsing the reasoning tokens:

Does it have something to do with this, feom unsloth:

Nemotron 3 chat template format:

Nemotron 3 uses <think> with token id 12 and </think> with id 13 for reasoning. Use --special to see the tokens.

https://docs.unsloth.ai/models/nemotron-3

dinerburger · 2025-12-15T17:57:25Z

Even with --special it doesn't appear to emit the thinking tags. (OpenWebUI does not render them for example). You do get <|im_end however.

aldehir · 2025-12-15T18:23:48Z

Hoping that we'll get support from the community to fix this. Pinging @aldehir

I will take a look.

pwilkin · 2025-12-15T22:54:43Z

LMStudio seems to support it and support the thinking, so there has to be some way to make it work, but yeah, during normal generation on server WebUI I didn't see the closing thinking tag (the opening one I assume is appended to the generation prompt, so it's a typical thinking_forced_open = true case).

CISC

Nits only, not required to apply any.

convert_hf_to_gguf.py

gguf-py/gguf/tensor_mapping.py

src/llama-graph.cpp

danielhanchen · 2025-12-16T04:25:34Z

@pwilkin Actually I re-checked you're correct - --reasoning-format deepseek-legacy or --reasoning-format deepseek doesn't enable <think></think> parsing. Also side note --verbose-prompt I think is broken via llama-completion / llama-cli - it doesn't print out the previous prompt and token ids anymore :(

But yes <think> is by default prepended

ggerganov · 2025-12-16T06:11:09Z

I'm thinking that before we fix the reasoning parsing, there is no point to merge this PR. So let's put it on hold until we figure it out.

aldehir · 2025-12-16T06:12:55Z

@ggerganov I have the changes ready, although they support reasoning + tool calling so they're not small. How would you like me to proceed?

I can provide a subset of the changes to only address the reasoning and then add the rest in an another PR.

ggerganov · 2025-12-16T06:16:57Z

@aldehir Ah great. Let's merge then and please open a PR after this with your changes. Thanks.

@danbev Merge at will

danbev · 2025-12-16T07:26:28Z

The models are now available on Huggingface.

llama : add support for NVIDIA Nemotron Nano 3

0aebd86

This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model.

danbev requested a review from CISC as a code owner December 15, 2025 14:15

ggerganov changed the title ~~llama : add support for NVIDIA Nemotron Nano 3~~ llama : add support for NVIDIA Nemotron 3 Nano Dec 15, 2025

ggerganov reviewed Dec 15, 2025

View reviewed changes

src/llama-graph.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Dec 15, 2025

View reviewed changes

src/models/nemotron-h.cpp Show resolved Hide resolved

danbev added 2 commits December 15, 2025 15:28

fix indentation in llama-graph.cpp

31240ce

fix indentation and move ffn_inp

c257b1f

convert : fix modify_tensors in NemotronHModel to call super()

436e2f0

fix pyright error

f061894

jeffbolznv reviewed Dec 15, 2025

View reviewed changes

src/llama-graph.cpp Show resolved Hide resolved

ggerganov approved these changes Dec 15, 2025

View reviewed changes

gabe-l-hart mentioned this pull request Dec 15, 2025

Feature Request: Nemotron-3-Nano-30B-A3B model (moe on nemotron_h) #18064

Open

4 tasks

github-actions bot added model Model specific python python script changes labels Dec 15, 2025

fix flake8 errors

1d4f242

rick-github mentioned this pull request Dec 15, 2025

Nemotron‑3‑Nano‑30B‑A3B ollama/ollama#13482

Closed

CISC approved these changes Dec 15, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

gguf-py/gguf/tensor_mapping.py Outdated Show resolved Hide resolved

src/llama-graph.cpp Outdated Show resolved Hide resolved

danbev added 2 commits December 16, 2025 06:12

llama : fix inconsistent brace style in switch [no ci]

64a8586

gguf-py : make e_score_correction_bias tensor name consistent

9f66613

danbev merged commit 2995341 into ggml-org:master Dec 16, 2025
72 of 74 checks passed

aldehir mentioned this pull request Dec 16, 2025

NVIDIA Nemotron 3 parsing #18077

Merged

great1cornholio mentioned this pull request Dec 16, 2025

Eval bug: Nemotron 3 Nano crashes on CPU-only with GGML_ASSERT(*cur_backend_id != -1) failed #18099

Open

llama : add support for NVIDIA Nemotron 3 Nano #18058

llama : add support for NVIDIA Nemotron 3 Nano #18058

Conversation

danbev commented Dec 15, 2025 • edited by ggerganov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielhanchen commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arch-btw commented Dec 15, 2025

Uh oh!

danielhanchen commented Dec 15, 2025

Uh oh!

Uh oh!

cmp-nct commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

engrtipusultan commented Dec 15, 2025

Uh oh!

dinerburger commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Dec 15, 2025

Uh oh!

pwilkin commented Dec 15, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielhanchen commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Dec 16, 2025

Uh oh!

aldehir commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Dec 16, 2025

Uh oh!

Uh oh!

danbev commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

danbev commented Dec 15, 2025 •

edited by ggerganov

Loading

danielhanchen commented Dec 15, 2025 •

edited

Loading

cmp-nct commented Dec 15, 2025 •

edited

Loading

dinerburger commented Dec 15, 2025 •

edited

Loading

danielhanchen commented Dec 16, 2025 •

edited

Loading

aldehir commented Dec 16, 2025 •

edited

Loading