Age | Commit message (Collapse) | Author |
|
|
|
(#2588)
* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration
|
|
|
|
* server: fixed wrong variable name in timing json
* remove redunct entry
|
|
versions of Windows.
|
|
|
|
|
|
* Allow passing grammar to completion endpoint
|
|
|
|
|
|
* Update Vim plugin
* Remove getbufoneline usage, Add input bind example.
getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.
An additional example that explains how to add a keybind that works in
insert mode was added.
|
|
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
|
|
persistence (#2521)
|
|
|
|
|
|
|
|
* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof
|
|
|
|
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
|
|
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
|
|
|
|
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
|
|
|
|
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
|
|
|
|
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
|
|
|
|
|
|
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools
|
|
|
|
* escape HTML in webchat
* add amp
|
|
* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci
|
|
* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields
|
|
* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee>
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Add gqa parameter support to the server
* Change help from stderr to stdout
|
|
* Fix #2345, fix incorrect n_threads
* Update examples/common.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de>
|
|
|
|
Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files
|
|
|
|
* Add parameter --perplexity-lines to perplexity.cpp
|
|
VIM plugin for server exe
|
|
models from local HF Transformer models (#2311)
* Resync my fork with new llama.cpp commits
* examples : rename to use dash instead of underscore
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
promt -> prompt
|
|
|
|
guidance scale (#2280)
|
|
A fix in Makefile for FreeBSD users. In the platfrom x86_64 is amd64. This fix resolve compilation using CFLAGS and CXXFLAGS with -march=native and -mtune=native
Add two examples for interactive mode using Llama2 models (thx TheBloke for models)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* Miku.sh: Set default model to llama-2-7b-chat
* Miku.sh: Set ctx_size to 4096
* Miku.sh: Add in-prefix/in-suffix opts
* Miku.sh: Switch sampler to mirostat_v2 and tiny prompt improvements
|