-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Description
Name and Version
llama server
llama-b7079-bin-win-cpu-x64
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
C:\Users\21\Downloads\llama-b7079-bin-win-cpu-x64\llama-server.exe -m "C:\Users\21\Downloads\Qwen3-VL-4B-Instruct-Q8_0.gguf" -c 8000 --port 1280 --api-key 123 --mmproj C:\Users\21\Downloads\mmproj-Qwen3-VL-4B-Instruct-Q8_0.gguf -np 1Problem description & steps to reproduce
I used the following command to start the llama server, expecting to launch one slot with a context of 8000, but the llama server started four slots, each with a context of 8000.
this is the command
C:\Users\21\Downloads\llama-b7079-bin-win-cpu-x64\llama-server.exe -m "C:\Users\21\Downloads\Qwen3-VL-4B-Instruct-Q8_0.gguf" -c 8000 --port 1280 --api-key 123 --mmproj C:\Users\21\Downloads\mmproj-Qwen3-VL-4B-Instruct-Q8_0.gguf -np 1
this is the start log
load_hparams: model size: 800.43 MiB
load_hparams: metadata size: 0.11 MiB
alloc_compute_meta: warmup with image size = 1472 x 1472
alloc_compute_meta: CPU compute buffer size = 322.49 MiB
alloc_compute_meta: graph splits = 1, nodes = 766
warmup: flash attention is enabled
srv load_model: loaded multimodal model, 'C:\Users\21\Downloads\mmproj-Qwen3-VL-4B-Instruct-Q8_0.gguf'
srv init: initializing slots, n_slots = 4
slot init: id 0 | task -1 | new slot, n_ctx = 8192
slot init: id 1 | task -1 | new slot, n_ctx = 8192
slot init: id 2 | task -1 | new slot, n_ctx = 8192
slot init: id 3 | task -1 | new slot, n_ctx = 8192
srv init: prompt cache is enabled, size limit: 8192 MiB
srv init: use --cache-ram 0 to disable the prompt cache
srv init: for more info see #16391
srv init: thinking = 0
First Bad Commit
No response