Age | Commit message (Collapse) | Author |
|
|
|
|
|
(#2588)
* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration
|
|
* Enhance Windows 7 compatibility.
* Clean away unnecessary preprocessor conditional
|
|
* adds simple grammar parsing tests
* adds cassert header
|
|
|
|
|
|
* server: fixed wrong variable name in timing json
* remove redunct entry
|
|
versions of Windows.
|
|
|
|
|
|
* ggml-alloc: Don't try to re-use buffers of external tensors
They might be weights that came from another context, so we
have no control over them (and they might be re-used elsewhere
so writing to them would be a bad idea).
* ggml-alloc: >= when checking for out-of-bounds
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
|
|
* add log_callback to llama_context_params for custom logging.
* Fix macro expansion on gcc
* Add struct llama_state for global variables and move log_callback there
* Turn log level into enum and some minor changes.
* Remove model_for_logging parameter (not needed anymore)
* Convert remaining fprintf(stderr, ...) calls to use new macros.
* Fix enum and initialize g_state
* Fix log calls after merge
* Fix missing static
* Add back all the new lines in the logging strings
* Add comment for llama_log_callback and replace remaining printf calls
---------
Co-authored-by: grahameth <->
Co-authored-by: Helmut <helmut.buhler@inf.h-brs.de>
|
|
|
|
* Allow passing grammar to completion endpoint
|
|
|
|
|
|
|
|
* Update Vim plugin
* Remove getbufoneline usage, Add input bind example.
getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.
An additional example that explains how to add a keybind that works in
insert mode was added.
|
|
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
|
|
* ggml : mul mat wip
ggml-ci
* ggml : alternative thread distribution for mul_mat
ggml-ci
* ggml : mul_mat block tiling attempt
* ggml : mul_mat threads yield
ggml-ci
|
|
|
|
ggml-ci
|
|
ggml-ci
|
|
|
|
* metal : fix out-of-bounds access + style changes
* metal : increase concurrency nodes to 2*GGML_MAX_NODES
|
|
|
|
* zig build fixes
* Disable LTO on Windows.
|
|
persistence (#2521)
|
|
|
|
|
|
|
|
|
|
|
|
Fixes #2503
|
|
|
|
* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof
|
|
upfront (#2488)
* added stream saving context data to file to avoid allocating unnecessary amounts of memory
* generalised copying state data to file or buffer
* added comments explaining how copy_state_data works
* fixed trailing whitespaces
* fixed save load state example
* updated save load state to use public function in llama.cpp
* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function
* fixed function declaration order
* restored save load state example
* fixed whitepace
* removed unused llama-util.h include
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* Apply code review suggestions
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
|
|
|
|
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
|
|
|
|
|
|
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
* Add Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
* Up Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
|
|
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
|
|
* add support for chinese llama-2 / alpaca-2
* remove white spaces
|
|
|
|
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
|
|
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
|
|
|
|
|