Skip to content

Misc. bug: The Llama server starts with 4 slots by default. #17300

@sxch775-work

Description

@sxch775-work

Name and Version

llama server
llama-b7079-bin-win-cpu-x64

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

C:\Users\21\Downloads\llama-b7079-bin-win-cpu-x64\llama-server.exe -m "C:\Users\21\Downloads\Qwen3-VL-4B-Instruct-Q8_0.gguf" -c 8000 --port 1280 --api-key 123 --mmproj C:\Users\21\Downloads\mmproj-Qwen3-VL-4B-Instruct-Q8_0.gguf -np 1

Problem description & steps to reproduce

I used the following command to start the llama server, expecting to launch one slot with a context of 8000, but the llama server started four slots, each with a context of 8000.
this is the command
C:\Users\21\Downloads\llama-b7079-bin-win-cpu-x64\llama-server.exe -m "C:\Users\21\Downloads\Qwen3-VL-4B-Instruct-Q8_0.gguf" -c 8000 --port 1280 --api-key 123 --mmproj C:\Users\21\Downloads\mmproj-Qwen3-VL-4B-Instruct-Q8_0.gguf -np 1

this is the start log
load_hparams: model size: 800.43 MiB
load_hparams: metadata size: 0.11 MiB
alloc_compute_meta: warmup with image size = 1472 x 1472
alloc_compute_meta: CPU compute buffer size = 322.49 MiB
alloc_compute_meta: graph splits = 1, nodes = 766
warmup: flash attention is enabled
srv load_model: loaded multimodal model, 'C:\Users\21\Downloads\mmproj-Qwen3-VL-4B-Instruct-Q8_0.gguf'
srv init: initializing slots, n_slots = 4
slot init: id 0 | task -1 | new slot, n_ctx = 8192
slot init: id 1 | task -1 | new slot, n_ctx = 8192
slot init: id 2 | task -1 | new slot, n_ctx = 8192
slot init: id 3 | task -1 | new slot, n_ctx = 8192
srv init: prompt cache is enabled, size limit: 8192 MiB
srv init: use --cache-ram 0 to disable the prompt cache
srv init: for more info see #16391
srv init: thinking = 0

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions