summaryrefslogtreecommitdiff
path: root/examples/server/README.md
AgeCommit message (Expand)Author
2024-03-17common: llama_load_model_from_url using --model-url (#6098)Pierrick Hymbert
2024-03-11Update server docker image URLs (#5997)Jakub N
2024-03-11Server: format error to json (#5961)Xuan Son Nguyen
2024-03-09server : clarify some items in the readme (#5957)Georgi Gerganov
2024-03-09Server: reorganize some http logic (#5939)Xuan Son Nguyen
2024-03-09server : add SSL support (#5926)Gabe Goodhart
2024-03-07server : refactor (#5882)Georgi Gerganov
2024-03-03server : init http requests thread pool with --parallel if set (#5836)Pierrick Hymbert
2024-03-01server : remove api_like_OAI.py proxy script (#5808)Georgi Gerganov
2024-03-01server: allow to override threads server pool with --threads-http (#5794)Pierrick Hymbert
2024-02-25server: docs - refresh and tease a little bit more the http server (#5718)Pierrick Hymbert
2024-02-25server: logs - unified format and --log-format option (#5700)Pierrick Hymbert
2024-02-25server: concurrency fix + monitoring - add /metrics prometheus compatible end...Pierrick Hymbert
2024-02-24server: init functional tests (#5566)Pierrick Hymbert
2024-02-22server : clarify some params in the docs (#5640)Alexey Parfenov
2024-02-22Add docs for llama_chat_apply_template (#5645)Xuan Son Nguyen
2024-02-21server: health: fix race condition on slots data using tasks queue (#5634)Pierrick Hymbert
2024-02-20server : health endpoint configurable failure on no slot (#5594)Pierrick Hymbert
2024-02-18common, server : surface min_keep as its own parameter (#5567)Robey Holderith
2024-02-18server : slots monitoring endpoint (#5550)Pierrick Hymbert
2024-02-18server : enhanced health endpoint (#5548)Pierrick Hymbert
2024-02-18server : --n-predict option document and cap to max value (#5549)Pierrick Hymbert
2024-02-16server : add "samplers" param to control the samplers order (#5494)Alexey Parfenov
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-11server : allow to specify tokens as strings in logit_bias (#5003)Alexey Parfenov
2024-02-07server : update `/props` with "total_slots" value (#5373)Justin Parker
2024-02-06server : add `dynatemp_range` and `dynatemp_exponent` (#5352)Michael Coppola
2024-02-05server : allow to get default generation settings for completion (#5307)Alexey Parfenov
2024-01-30server : improve README (#5209)Wu Jian Ping
2024-01-28docker : add server-first container images (#5157)Kyle Mistele
2024-01-27server : add self-extend support (#5104)Maximilian Winter
2024-01-11server : support for multiple api keys (#4864)Michael Coppola
2024-01-11server : update readme to document the new `/health` endpoint (#4866)Behnam M
2024-01-09server : update readme about token probs (#4777)Behnam M
2024-01-09server : add api-key flag to documentation (#4832)Zsapi
2024-01-04server : fix options in README.md (#4765)Michael Coppola
2023-12-29server : allow to generate multimodal embeddings (#4681)Karthik Sethuraman
2023-12-23server : allow to specify custom prompt for penalty calculation (#3727)Alexey Parfenov
2023-12-10Update README.md (#4388)Yueh-Po Peng
2023-11-25server : OAI API compatibility (#4198)Georgi Gerganov
2023-11-08server : add min_p param (#3877)Mihai
2023-11-05server : fix typo for --alias shortcut from -m to -a (#3958)Thái Hoàng Tâm
2023-10-22server : parallel decoding and multimodal (#3677)Georgi Gerganov
2023-10-17editorconfig : remove trailing spacesGeorgi Gerganov
2023-10-17server : documentation of JSON return value of /completion endpoint (#3632)coezbek
2023-10-06server : docs fix default values and add n_probs (#3506)Mihai
2023-10-02infill : add new example + extend server API (#3296)vvhg1
2023-09-28llama.cpp : split llama_context_params into model and context params (#3301)slaren
2023-08-27server : add `/detokenize` endpoint (#2802)Bruce MacDonald
2023-08-26examples : skip unnecessary external lib in server README.md how-to (#2804)lon