summaryrefslogtreecommitdiff
path: root/examples/server/README.md
AgeCommit message (Expand)Author
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
2024-06-13`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...Olivier Chafik
2024-06-07server: update cache_prompt documentation [no ci] (#7745)Johannes Gäßler
2024-05-19server: add test for token probs (#7347)Johannes Gäßler
2024-05-18server: correct --threads documentation [no ci] (#7362)Johannes Gäßler
2024-05-17[Server] Added --verbose option to README [no ci] (#7335)Leon Knauer
2024-05-14docs: Fix typo and update description for --embeddings flag (#7026)Ryuei
2024-05-08server : add_special option for tokenize endpoint (#7059)Johan
2024-05-07server: fix incorrectly reported token probabilities (#7125)Johannes Gäßler
2024-05-07server : update readme with undocumented options (#7013)Kyle Mistele
2024-04-29build(cmake): simplify instructions (`cmake -B build && cmake --build build ....Olivier Chafik
2024-04-12JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings,...Olivier Chafik
2024-04-08llama : save and restore kv cache for single seq id (#6341)Jan Boon
2024-04-04server : remove obsolete --memory-f32 optionGeorgi Gerganov
2024-04-03A few small fixes to server's README docs (#6428)Fattire
2024-03-26cuda : rename build flag to LLAMA_CUDA (#6299)slaren
2024-03-25Server: clean up OAI params parsing function (#6284)Xuan Son Nguyen
2024-03-23common: llama_load_model_from_url split support (#6192)Pierrick Hymbert
2024-03-23server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (...Pierrick Hymbert
2024-03-21server : update readme doc from `slot_id` to `id_slot` (#6213)Jan Boon
2024-03-17common: llama_load_model_from_url using --model-url (#6098)Pierrick Hymbert
2024-03-11Update server docker image URLs (#5997)Jakub N
2024-03-11Server: format error to json (#5961)Xuan Son Nguyen
2024-03-09server : clarify some items in the readme (#5957)Georgi Gerganov
2024-03-09Server: reorganize some http logic (#5939)Xuan Son Nguyen
2024-03-09server : add SSL support (#5926)Gabe Goodhart
2024-03-07server : refactor (#5882)Georgi Gerganov
2024-03-03server : init http requests thread pool with --parallel if set (#5836)Pierrick Hymbert
2024-03-01server : remove api_like_OAI.py proxy script (#5808)Georgi Gerganov
2024-03-01server: allow to override threads server pool with --threads-http (#5794)Pierrick Hymbert
2024-02-25server: docs - refresh and tease a little bit more the http server (#5718)Pierrick Hymbert
2024-02-25server: logs - unified format and --log-format option (#5700)Pierrick Hymbert
2024-02-25server: concurrency fix + monitoring - add /metrics prometheus compatible end...Pierrick Hymbert
2024-02-24server: init functional tests (#5566)Pierrick Hymbert
2024-02-22server : clarify some params in the docs (#5640)Alexey Parfenov
2024-02-22Add docs for llama_chat_apply_template (#5645)Xuan Son Nguyen
2024-02-21server: health: fix race condition on slots data using tasks queue (#5634)Pierrick Hymbert
2024-02-20server : health endpoint configurable failure on no slot (#5594)Pierrick Hymbert
2024-02-18common, server : surface min_keep as its own parameter (#5567)Robey Holderith
2024-02-18server : slots monitoring endpoint (#5550)Pierrick Hymbert
2024-02-18server : enhanced health endpoint (#5548)Pierrick Hymbert
2024-02-18server : --n-predict option document and cap to max value (#5549)Pierrick Hymbert
2024-02-16server : add "samplers" param to control the samplers order (#5494)Alexey Parfenov
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-11server : allow to specify tokens as strings in logit_bias (#5003)Alexey Parfenov
2024-02-07server : update `/props` with "total_slots" value (#5373)Justin Parker
2024-02-06server : add `dynatemp_range` and `dynatemp_exponent` (#5352)Michael Coppola
2024-02-05server : allow to get default generation settings for completion (#5307)Alexey Parfenov
2024-01-30server : improve README (#5209)Wu Jian Ping