summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-06-09convert-hf : set the model name based on cli arg, if present (#7693)sasha0552
2024-06-09convert-hf : match model part name prefix and suffix (#7687)compilade
2024-06-09gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)compilade
2024-06-09Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)slaren
2024-06-08url: save -mu downloads to new cache location (#7826)Olivier Chafik
2024-06-08server : smart slot selection using Longest Common Prefix (#7728)sasha0552
2024-06-07vulkan : reuse parent extra for views (#7806)slaren
2024-06-07gguf-split : change binary multi-byte units to decimal (#7803)Christian Zhou-Zheng
2024-06-07cmake : fix BUILD_SHARED_LIBS=ON build (#7784)intelmatt
2024-06-07server: update cache_prompt documentation [no ci] (#7745)Johannes Gäßler
2024-06-07server : do not get prompt in infill mode (#7286)woodx
2024-06-07[SYCL] fix softmax r2r result wrong issue (#7811)pengxin99
2024-06-07check for nans in imatrix and quantize (#7807)slaren
2024-06-06server : fix --threads-http arg (#7801)Georgi Gerganov
2024-06-06imatrix : migrate to gpt_params (#7771)Georgi Gerganov
2024-06-06Added support for . (any character) token in grammar engine. (#6467)Clint Herron
2024-06-06README minor fixes (#7798) [no ci]Mattheus Chediak
2024-06-06grammars: x{min,max} repetition operator (#6640)Olivier Chafik
2024-06-06llama : add jina v2 base code (#7596)Joan Fontanals
2024-06-06docker : build only main and server in their images (#7782)slaren
2024-06-06docker : add openmp lib (#7780)slaren
2024-06-06Fix encoding in python scripts (#7733)Galunid
2024-06-05CUDA: refactor mmq, dmmv, mmvq (#7716)Johannes Gäßler
2024-06-05ggml : refactor rope norm/neox (#7634)Georgi Gerganov
2024-06-05readme : remove -ins (#7759)arch-btw
2024-06-05Fix per token atrributes bits (#7749)jaime-m-p
2024-06-04Allow number of nodes in CUDA graph to change (#7738)agray3
2024-06-04common : refactor cli arg parsing (#7675)Georgi Gerganov
2024-06-04ggml : remove OpenCL (#7735)Georgi Gerganov
2024-06-04llama : remove beam search (#7736)Georgi Gerganov
2024-06-04readme : remove obsolete Zig instructions (#7471)Georgi Gerganov
2024-06-04llama-bench : allow using a different printer for stderr with -oe (#7722)slaren
2024-06-04Improve hipBLAS support in CMake (#7696)Daniele
2024-06-04refine .gitignore (#7688)zhouwg
2024-06-04Per token attributes (#7685)jaime-m-p
2024-06-04ggml : prevent builds with -ffinite-math-only (#7726)Georgi Gerganov
2024-06-03llama : offload to RPC in addition to other backends (#7640)Radoslav Gerganov
2024-06-03ggml : use OpenMP as a thread pool (#7606)Masaya, Kato
2024-06-03make: fix debug options not being applied to NVCC (#7714)Johannes Gäßler
2024-06-03Vulkan Mixture of Experts (MoE) support (#7628)0cc4m
2024-06-03cmake : add pkg-config spec file for llama.cpp (#7702)Andy Tai
2024-06-03llama : MiniCPM support tied embeddings (#7664)zhangkaihuo
2024-06-03llama : avoid double token-to-piece cache (#7654)Georgi Gerganov
2024-06-03kompute : implement op_getrows_f32 (#6403)woachk
2024-06-02fix bug introduced in using calloc (#7701)Dave Airlie
2024-06-02flake.lock: Update (#7686)Georgi Gerganov
2024-06-02chore : add ignore rule for generated server themes (#7689)Austin
2024-06-02[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)nickp27
2024-06-01Fix FlashAttention debug test, FP32 assert (#7684)Johannes Gäßler
2024-06-01server : new UI (#7633)Yazan Agha-Schrader