-
Notifications
You must be signed in to change notification settings - Fork 14.1k
server: [RFC] add optional POST /exit endpoint for graceful shutdown #18086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Introduce --endpoint-exit flag and LLAMA_ARG_ENDPOINT_EXIT env var - Add endpoint_exit to common_params (disabled by default) - Implement POST /exit with explicit confirmation token to prevent misuse - Support graceful shutdown via injected on_shutdown callback - Handle both router and non-router server shutdown paths
|
Any reasons why you cannot use the /models/unload endpoint of router mode? The router mode is designed to run as daemon. I recommend doing that instead of an /exit endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
traditional OS-level signals (e.g. SIGTERM, SIGINT) are unavailable or unreliable. (for eg. Windows)
first of, if you cannot send OS-level signal to an application, there is something to do with the way you spawn and manage the process, but not the application itself.
for the same reason, Windows has windows services and Linux usually has something like systemd to spawn and manage daemon processes. such process never has to expose a shutdown mechanism to the user space, instead, user send a request to the manager (systemctl for example), and the manager shutdown the process.
if there is a problem with this mechanism on windows, we should fix it, but not to circumvent by introducing yet another mechanism (e.g. a /exit endpoint)
therefore, I against this proposal as it seems like an anti-pattern / misuse in term of system design
| const json body = json::parse(req.body); | ||
| const std::string confirm = json_value(body, "confirm", std::string()); | ||
|
|
||
| if (confirm != "shutdown") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a good way to design an API either. I don't get why a confirmation can prevent someone accidentally exit the server if the request is sent by a program and not a human
we are designing Application Programming Interface, not Human Interface here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why a confirmation can prevent someone accidentally exit the server if the request is sent by a program and not a human
This endpoint is disabled by default and need a flag or an env variable so there isn't any cause of accidents I think. But I agree on a better API design. This is just a working PoC RFC PR.
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | ||
| SRV_INF("%s: executing on_shutdown callback...\n", __func__); | ||
| try { | ||
| shutdown_cb(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will definitely cause deadlocks in some cases. a HTTP thread should never kill itself (shutdown_cb will invoke termination of all HTTP threads)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use ctx_server.queue_tasks.terminate() in hope the control will return back to main() where shutdown and cleanups will be called but that only works in non router mode I think since it doesn't shutdown the http server thread. Please cmiiw.
That only unloads the model, not terminate the server process, especially in router mode. |
|
In router mode, unload model == terminate child process holding the model |
This PR === concept of gracefully terminates the main router process by a client since the client manages it(both launching and termination). |
Summary
This PR proposes adding an optional, explicitly gated HTTP endpoint (
POST /exit) tollama-serverthat allows a client application to request a graceful server shutdown when traditional OS-level signals (e.g.SIGTERM,SIGINT) are unavailable or unreliable. (for eg. Windows)The endpoint is disabled by default and can only be enabled via an explicit command-line flag or environment variable.
Motivation
Many
llama-serverdeployments are no longer run as simple foreground processes where POSIX signals are always available.In these environments, the client cannot reliably send
SIGTERMorSIGINTAs a result, client applications currently resort to hard process termination
An application-level shutdown API mechanism provides a portable, explicit, and graceful alternative for clients to request the server to clean up and shutdown.
Proposed Solution
Introduce a POST-only endpoint:
Behavior
/exitreturn an error indicating the endpoint is not supportedExample Request
Example Response:
{ "message": "Server shutdown initiated", "status": "terminating" }Configuration & Safety Guarantees
Disabled by Default
The endpoint is off by default.
It must be explicitly enabled using either:
Or environment variable:
I will update the docs after getting inputs and feedback's before merging.
Please note: This is definitely not indented for public servers!!
Current issues