Age | Commit message (Collapse) | Author |
|
* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
* Fix phi3 template matching vs zephyr
* Add regression test for new phi3 chat template
* Implement review suggestions
* Fix phi3 jinja test templates & match by <|end|>
* Apply suggestion
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Add all phi3 template variants in tests
* Remove unneeded message trimming
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Fix tests to not expect trimmed messages
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
|
|
* Add phi 3 chat template & tests
* test : fix chat template result
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Added llama-3 chat template
* Update llama.cpp
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
* Update llama.cpp
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
* Update tests/test-chat-template.cpp
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
* Added EOS stop sequence according to https://github.com/ggerganov/llama.cpp/pull/6751#issuecomment-2065602862
* Removed adding of BOS token before first message
* Removed bos token from expected output from llama-3
* Update tests/test-chat-template.cpp
Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
* Update tests/test-chat-template.cpp
Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
* Added <|end_of_text|> as another stop token
* Reverted last change of adding the end_of_text stop word for llama 3
---------
Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Add chat template for command-r model series
* Fix indentation
* Add chat template test for command-r models and update the implementation to trim whitespaces
* Remove debug print
|
|
* Add openchat chat template
* Add chat template test for openchat
* Add chat template for vicuna
* Add chat template for orca-vicuna
* Add EOS for vicuna templates
* Combine vicuna chat templates
* Add tests for openchat and vicuna chat templates
* Add chat template for alpaca
* Add separate template name for vicuna-orca
* Remove alpaca, match deepseek with jinja output
* Regenerate chat template test with add_generation_prompt
* Separate deepseek bos from system message
* Match openchat template with jinja output
* Remove BOS token from templates, unprefix openchat
|
|
|
|
* add gemma chat template
* gemma: only apply system_prompt on non-model message
|
|
* server: fallback to chatml
* add new chat template
* server: add AlphaMonarch to test chat template
* server: only check model template if there is no custom tmpl
* remove TODO
|
|
* llama: add llama_chat_apply_template
* test-chat-template: remove dedundant vector
* chat_template: do not use std::string for buffer
* add clarification for llama_chat_apply_template
* llama_chat_apply_template: add zephyr template
* llama_chat_apply_template: correct docs
* llama_chat_apply_template: use term "chat" everywhere
* llama_chat_apply_template: change variable name to "tmpl"
|