| Age | Commit message (Collapse) | Author | 
|---|
|  | * llama : add benchmark example
* add to examples CMakeLists.txt
* fix msvc build
* add missing include
* add Bessel's correction to stdev calculation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* improve markdown formatting
* add missing include
* print warning is NDEBUG is not defined
* remove n_prompt and n_gen from the matrix, use each value separately instead
* better checks for non-optimized builds
* llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call
* fix json formatting
* add sql output
* add basic cpu and gpu info (linx/cuda only)
* markdown: also show values that differ from the default
* markdown: add build id
* cleanup
* improve formatting
* formatting
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de> | 
|  |  | 
|  | * support for templates in browser LocalStorage
* sync accepted #2409 fix from upstream
* convert autosave invocation to useEffect
* Apply suggestions from code review
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
* Regen index.html.cpp, suggested from code review
---------
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> | 
|  | Add --cfg-negative-prompt-file option for examples | 
|  | fixes #2611 | 
|  |  | 
|  |  | 
|  | (#2588)
* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration | 
|  |  | 
|  | * server: fixed wrong variable name in timing json
* remove redunct entry | 
|  | versions of Windows. | 
|  |  | 
|  |  | 
|  | * Allow passing grammar to completion endpoint | 
|  |  | 
|  |  | 
|  | * Update Vim plugin
* Remove getbufoneline usage, Add input bind example.
getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.
An additional example that explains how to add a keybind that works in
insert mode was added. | 
|  | * common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling | 
|  | persistence (#2521) | 
|  |  | 
|  |  | 
|  |  | 
|  | * Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof | 
|  |  | 
|  | * examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text | 
|  | * fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params | 
|  |  | 
|  | * server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh | 
|  |  | 
|  | * common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording | 
|  |  | 
|  | * add: server chat mode with llama2
* fix: remove the unnecessary last \n | 
|  |  | 
|  |  | 
|  | Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> | 
|  | * add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools | 
|  |  | 
|  | * escape HTML in webchat
* add amp | 
|  | * make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci | 
|  | * makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields | 
|  | * llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee>
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> | 
|  | * Add gqa parameter support to the server
* Change help from stderr to stdout | 
|  | * Fix #2345, fix incorrect n_threads
* Update examples/common.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> | 
|  | * CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de> | 
|  |  | 
|  | Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files | 
|  |  | 
|  | * Add parameter --perplexity-lines to perplexity.cpp | 
|  | VIM plugin for server exe | 
|  | models from local HF Transformer models (#2311)
* Resync my fork with new llama.cpp commits
* examples : rename to use dash instead of underscore
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |