index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-06-09
convert-hf : set the model name based on cli arg, if present (#7693)
sasha0552
2024-06-09
convert-hf : match model part name prefix and suffix (#7687)
compilade
2024-06-09
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
compilade
2024-06-09
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
slaren
2024-06-08
url: save -mu downloads to new cache location (#7826)
Olivier Chafik
2024-06-08
server : smart slot selection using Longest Common Prefix (#7728)
sasha0552
2024-06-07
vulkan : reuse parent extra for views (#7806)
slaren
2024-06-07
gguf-split : change binary multi-byte units to decimal (#7803)
Christian Zhou-Zheng
2024-06-07
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
intelmatt
2024-06-07
server: update cache_prompt documentation [no ci] (#7745)
Johannes Gäßler
2024-06-07
server : do not get prompt in infill mode (#7286)
woodx
2024-06-07
[SYCL] fix softmax r2r result wrong issue (#7811)
pengxin99
2024-06-07
check for nans in imatrix and quantize (#7807)
slaren
2024-06-06
server : fix --threads-http arg (#7801)
Georgi Gerganov
2024-06-06
imatrix : migrate to gpt_params (#7771)
Georgi Gerganov
2024-06-06
Added support for . (any character) token in grammar engine. (#6467)
Clint Herron
2024-06-06
README minor fixes (#7798) [no ci]
Mattheus Chediak
2024-06-06
grammars: x{min,max} repetition operator (#6640)
Olivier Chafik
2024-06-06
llama : add jina v2 base code (#7596)
Joan Fontanals
2024-06-06
docker : build only main and server in their images (#7782)
slaren
2024-06-06
docker : add openmp lib (#7780)
slaren
2024-06-06
Fix encoding in python scripts (#7733)
Galunid
2024-06-05
CUDA: refactor mmq, dmmv, mmvq (#7716)
Johannes Gäßler
2024-06-05
ggml : refactor rope norm/neox (#7634)
Georgi Gerganov
2024-06-05
readme : remove -ins (#7759)
arch-btw
2024-06-05
Fix per token atrributes bits (#7749)
jaime-m-p
2024-06-04
Allow number of nodes in CUDA graph to change (#7738)
agray3
2024-06-04
common : refactor cli arg parsing (#7675)
Georgi Gerganov
2024-06-04
ggml : remove OpenCL (#7735)
Georgi Gerganov
2024-06-04
llama : remove beam search (#7736)
Georgi Gerganov
2024-06-04
readme : remove obsolete Zig instructions (#7471)
Georgi Gerganov
2024-06-04
llama-bench : allow using a different printer for stderr with -oe (#7722)
slaren
2024-06-04
Improve hipBLAS support in CMake (#7696)
Daniele
2024-06-04
refine .gitignore (#7688)
zhouwg
2024-06-04
Per token attributes (#7685)
jaime-m-p
2024-06-04
ggml : prevent builds with -ffinite-math-only (#7726)
Georgi Gerganov
2024-06-03
llama : offload to RPC in addition to other backends (#7640)
Radoslav Gerganov
2024-06-03
ggml : use OpenMP as a thread pool (#7606)
Masaya, Kato
2024-06-03
make: fix debug options not being applied to NVCC (#7714)
Johannes Gäßler
2024-06-03
Vulkan Mixture of Experts (MoE) support (#7628)
0cc4m
2024-06-03
cmake : add pkg-config spec file for llama.cpp (#7702)
Andy Tai
2024-06-03
llama : MiniCPM support tied embeddings (#7664)
zhangkaihuo
2024-06-03
llama : avoid double token-to-piece cache (#7654)
Georgi Gerganov
2024-06-03
kompute : implement op_getrows_f32 (#6403)
woachk
2024-06-02
fix bug introduced in using calloc (#7701)
Dave Airlie
2024-06-02
flake.lock: Update (#7686)
Georgi Gerganov
2024-06-02
chore : add ignore rule for generated server themes (#7689)
Austin
2024-06-02
[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)
nickp27
2024-06-01
Fix FlashAttention debug test, FP32 assert (#7684)
Johannes Gäßler
2024-06-01
server : new UI (#7633)
Yazan Agha-Schrader
[next]