Age | Commit message (Collapse) | Author |
|
|
|
* common : normalize naming style
ggml-ci
* common : match declaration / definition order
* zig : try to fix build
|
|
|
|
* phi3 : duplicate rope factors in each layer
phi3 : set phi-3 model type as 14B
model loader : simplify the process for duplicating model tensors
llama-bench : remove default pg test
* replace bool parameters in llama_model_loader with named flags
|
|
build (#7426)
|
|
|
|
|
|
|
|
* cuda : fix rope pos data
ggml-ci
* ggml : drop mode & 1 == 1 support for ggml_rope
ggml-ci
* ggml : support freq_factors for f16 rope (CPU)
ggml-ci
* tests : add rope tests using frequency factors
ggml-ci
|
|
* add phi3 128k support in convert-hf-to-gguf
* add phi3 128k support in cuda
* address build warnings on llama.cpp
* adjust index value in cuda long rope freq factors
* add long rope support in ggml cpu backend
* make freq factors only depend on ctx size
* remove unused rope scaling type 'su' frin gguf converter
* fix flint warnings on convert-hf-to-gguf.py
* set to the short freq factor when context size is small than trained context size
* add one line of comments
* metal : support rope freq_factors
* ggml : update ggml_rope_ext API to support freq. factors
* backends : add dev messages to support rope freq. factors
* minor : style
* tests : update to use new rope API
* backends : fix pragma semicolons
* minor : cleanup
* llama : move rope factors from KV header to tensors
* llama : remove tmp assert
* cuda : fix compile warning
* convert : read/write n_head_kv
* llama : fix uninitialized tensors
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
ggml-ci
|
|
|
|
|
|
|
|
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
|
|
|
|
* Update brute force test: add_special
* Update brute force test: default values for add_bos_token and add_eos_token
* Enable rtrim when pre-inserting BOS
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Revert "server : fix test regexes"
|
|
* Update brute force test: special tokens
* Fix added tokens
- Try to read 'added_tokens.json'.
- Try to read 'tokenizer_config.json'.
- Try to read 'tokenizer.json'.
* Fix special tokens rtrim
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix test regexes
|
|
* llama : remove Persimmon
* requirements : remove
|
|
|
|
* rpc : track allocated buffers
ref: #7407
* rpc : pack rpc_tensor tightly
|
|
* server : fix temperature
* server : disable tests relying on parallel determinism
* ci : change server Debug -> RelWithDebInfo
|
|
* Update SYCL upscale operation
* Formatting
* Remove messages
|
|
|
|
|
|
* add loongarch lsx and lasx optimize code
* Add loongarch compilation support to makefile
* revert stb_image.h
* opt bytes_from_nibbles_32 and sum_i16_pairs_float
* fix undeclared
* format code
* update
* update 2
---------
Co-authored-by: Jinyang He <hejinyang@loongson.cn>
|
|
* server : don't pass temperature as string
* server : increase timeout
* tests : fix the fix 0.8f -> 0.8
ggml-ci
* tests : set explicit temperature
|
|
|
|
|
|
for enabling AVX512_BF16 (#7258)
|
|
|
|
|
|
* Fix empty Vulkan host buffers
Add fp32 fp16 matmul shader
Fix matmul shader alignment
* Remove deprecated tensor->backend uses
* Fix Vulkan validation errors on embedding models with no offloaded layers
* Fix Vulkan llava segfault when not offloading layers
|
|
|
|
|
|
|
|
|
|
* Add StableLM pre-tokenizer
* Fix space
* Fix trailing whitespace
|
|
|
|
https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action
Recommends the use of checkout action to use the correct repo context
when applying settings for PR labels
e.g.
steps:
- uses: actions/checkout@v4 # Uploads repository content to the runner
with:
repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more
- uses: actions/labeler@v5
with:
configuration-path: 'path/to/the/uploaded/configuration/file'
|
|
|
|
* logging: output capture in cuda module
* fix compile error
* fix: vsnprintf terminates with 0, string use not correct
* post review
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
|
|
* Revert "ci : temporary disable sanitizer builds (#6128)"
This reverts commit 4f6d1337ca5a409dc74aca8c479b7c34408a69c0.
* ci : trigger
|
|
* android : use "ci-android" branch for CI
* ggml : disable SIMD exp and silu for 32-bit ARM
ggml-ci
* android : do not fetch, use add_subdirectory instead
* cmake : provide binary dir
|
|
|
|
|
|
|
|
Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116
There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
|
|
Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.
|
|
* Update and fix Vulkan softmax implementation
* Update and fix Vulkan argsort implementation
|