Age | Commit message (Collapse) | Author |
|
* server: use llama_chat_apply_template
* server: remove trailing space
* server: fix format_chat
* server: fix help message
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: fix formatted_chat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
This updates the server queue to support graceful shutdown of the server on signals.
|
|
* server: add mistral chat template
* server: fix typo
* server: rename template mistral to llama2
* server: format_llama2: remove BOS
* server: validate "--chat-template" argument
* server: clean up using_chatml variable
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
|
|
|
|
* server: add llama_server_queue struct
* server: add llama_server_response_event
* server: add comments
* server: move all mutexes away from server.cpp
* server: correct multitask response
* server: only add back deferred tasks when one slot is available
* server: fix a race condition cause by "request_completion"
|