Age | Commit message (Collapse) | Author |
|
* server: health: fix race condition on slots data using tasks queue
* server: health:
* include_slots only if slots_endpoint
* fix compile warning task.target_id not initialized.
|
|
* server: use llama_chat_apply_template
* server: remove trailing space
* server: fix format_chat
* server: fix help message
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: fix formatted_chat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
This updates the server queue to support graceful shutdown of the server on signals.
|
|
* server: add mistral chat template
* server: fix typo
* server: rename template mistral to llama2
* server: format_llama2: remove BOS
* server: validate "--chat-template" argument
* server: clean up using_chatml variable
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
|
|
|
|
* server: add llama_server_queue struct
* server: add llama_server_response_event
* server: add comments
* server: move all mutexes away from server.cpp
* server: correct multitask response
* server: only add back deferred tasks when one slot is available
* server: fix a race condition cause by "request_completion"
|