summaryrefslogtreecommitdiff
path: root/examples
AgeCommit message (Expand)Author
2024-02-18ci : fix wikitext url + compile warnings (#5569)Georgi Gerganov
2024-02-18common, server : surface min_keep as its own parameter (#5567)Robey Holderith
2024-02-18server : slots monitoring endpoint (#5550)Pierrick Hymbert
2024-02-18server : enhanced health endpoint (#5548)Pierrick Hymbert
2024-02-18server : --n-predict option document and cap to max value (#5549)Pierrick Hymbert
2024-02-18server : graceful server shutdown (#5244)Daniel Hiltgen
2024-02-18ggml, common, examples, tests : fixed type arguments in printf (#5528)Herman Semenov
2024-02-18llava : update surgery script to not remove tensors (#5536)Daniel Bevenius
2024-02-181.5 bit quantization (#5453)Kawrakow
2024-02-17ci : add an option to fail on compile warning (#3952)Ananta Bastola
2024-02-16llava : removed excess free(NULL) operation (#5531)Herman Semenov
2024-02-16server : add "samplers" param to control the samplers order (#5494)Alexey Parfenov
2024-02-16server : fix system prompt cli (#5516)Rőczey Barnabás
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-16llava : fix clip-model-is-vision flag in README.md (#5509)Daniel Bevenius
2024-02-15clip : fix wrong loop conditionGeorgi Gerganov
2024-02-15llava : fix memory management bug (#5491)Elbios
2024-02-15llaba : hotfix for llava-1.6 image number (#5495)John
2024-02-14llava : update README.md (#5489)John
2024-02-14llava : support v1.6 (#5267)John
2024-02-13gguf : add python reader example (#5216)John
2024-02-13finetune : rename feed-forward tensors (w1/w2/w3) (#4839)Daniel Bevenius
2024-02-13llama : support batched embeddings (#5466)Douglas Hanley
2024-02-12llava : remove prog parameter from ArgumentParser (#5457)Daniel Bevenius
2024-02-12sync : ggml (#5452)Georgi Gerganov
2024-02-11Add support for BERT embedding models (#5423)Douglas Hanley
2024-02-11server : allow to specify tokens as strings in logit_bias (#5003)Alexey Parfenov
2024-02-11main : ctrl+C print timing in non-interactive mode (#3873)Georgi Gerganov
2024-02-11lookup: add print for drafting performance (#5450)Johannes Gäßler
2024-02-11server : add llama2 chat template (#5425)Xuan Son Nguyen
2024-02-09llava : add requirements.txt and update README.md (#5428)Daniel Bevenius
2024-02-09server : fix prompt caching for repeated prompts (#5420)Riley Stewart
2024-02-08llava : add missing .py, and fix paths in README.md (#5414)Daniel Bevenius
2024-02-08llava: fix typo/formatting in README.md (#5405)Daniel Bevenius
2024-02-07llava-cli : always tokenize special tokens (#5382)Xiao-Yong Jin
2024-02-07server : update `/props` with "total_slots" value (#5373)Justin Parker
2024-02-06server : remove model.json endpoint (#5371)Alexey Parfenov
2024-02-06server : include total "num_slots" in props endpoint (#5349)Justin Parker
2024-02-06server : add `dynatemp_range` and `dynatemp_exponent` (#5352)Michael Coppola
2024-02-06server : various fixes for the prompt field in /completion (#5300)Niall Coates
2024-02-05server : allow to get default generation settings for completion (#5307)Alexey Parfenov
2024-02-04Adding some imatrix tools (#5302)Kawrakow
2024-02-03refactor : switch to emplace_back to avoid extra object (#5291)Michael Klimenko
2024-02-02perplexity : fix KL divergence calculations on Windows (#5273)kalomaze
2024-02-02[SYCL] update guide of SYCL backend (#5254)Neo Zhang Jianyu
2024-02-01add --no-mmap in llama-bench (#5257)Neo Zhang Jianyu
2024-01-31llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)Georgi Gerganov
2024-01-31llava : add MobileVLM support (#5132)JidongZhang-THU
2024-01-31format license text, restore apache license by legal suggestion (#5233)Neo Zhang Jianyu
2024-01-31support SYCL backend windows build (#5208)Neo Zhang Jianyu