Skip to content

Releases: ggml-org/llama.cpp

b7418

16 Dec 08:25
2995341

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama : add support for NVIDIA Nemotron 3 Nano (#18058)

  • llama : add support for NVIDIA Nemotron Nano 3

This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling
the conversion and running of this model.

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

b7415

16 Dec 00:45
c45f89d

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml-hexagon: mm for mtmd (#17894)

  • feat: add run_mtmd script for hexagon

  • fix: fix issue in fp16xfp32 mm

  • fix: remove opt_experiment for fp16xfp32 mm

  • fix: ggml-hexagon: matmul fp16xfp32 support non-contigious src0

  • fix: fix syntax check for run-mtmd.sh for cli

macOS/iOS:

Linux:

Windows:

openEuler:

b7414

16 Dec 01:09
9d52f17

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model : add KORMo model (#18032)

  • vocab: add KORMo Tokenizer

  • model: add KORMoForCausalLM

  • vocab: change pretokenizer to qwen2

  • lint: fix unintended line removal

  • model: make qwen2 bias tensor optional

  • model: use qwen2 architecture for KORMo

macOS/iOS:

Linux:

Windows:

openEuler:

b7413

16 Dec 00:18
4529c66

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

kv-cache: Fix state restore fragmented cache (#17982)

  • kv-cache : fix state restore with fragmented cache (#17527)

Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.

  • tests : update logic

  • cleanup: tightened state_read_meta sig, added is_contiguous case

  • fix: state_read_meta arg reorder loose ends


Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

b7411

15 Dec 19:47
165caaf

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: use shared buffers on eGPU (#17866)

  • metal: use shared buffers on eGPU

With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.

  • metal: use shared buffers on eGPU

  • metal: use shared buffers on eGPU

macOS/iOS:

Linux:

Windows:

openEuler:

b7410

15 Dec 18:01
96a181a

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

mtmd: refactor audio preprocessing (#17978)

  • mtmd: refactor audio preprocessing

  • refactor

Co-authored-by: Tarek tdakhran@users.noreply.github.com

  • wip

  • wip (2)

  • improve constructor

  • fix use_natural_log

  • fix padding for short input

  • clean up

  • remove need_chunking


Co-authored-by: Tarek tdakhran@users.noreply.github.com

macOS/iOS:

Linux:

Windows:

openEuler:

b7406

15 Dec 04:15
4aced7a

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826)

  • support gpt-oss GPU by OP add-id, mul_mat for mxfp4, swiglu_oai, fix warning

  • fix fault ut case, update ops.md

  • rebase, fix format issue

macOS/iOS:

Linux:

Windows:

openEuler:

b7405

15 Dec 04:09
745fa0e

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model : add glm-asr support (#17901)

  • [model] add glm-asr support

  • fix format for ci

  • fix convert format for ci

  • update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review

  • check root architecture for convert hf script

  • fix conficlt with upstream

  • fix convert script for glm asr & format clip-impl

  • format

  • restore hparams text

  • improved conversion


Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Windows:

openEuler:

b7404

14 Dec 22:34
5239229

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

preset: handle negated arg, reverse the meaning if needed (#18041)

macOS/iOS:

Linux:

Windows:

openEuler:

b7402

14 Dec 19:48
37f5a10

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

mtmd: enhance image resizing in llava_uhd (#18014)

macOS/iOS:

Linux:

Windows:

openEuler: