index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-06-17
update: support Qwen2-57B-A14B (#7835)
Ștefan-Gabriel Muscalu
2024-06-17
Make updates to type cast based on compiler instead of OS (#7851)
Srihari-mcw
2024-06-17
llama : disable FA if KV head size do not match (#7982)
Georgi Gerganov
2024-06-17
Add Nix and Flox install instructions (#7899)
Bryan Honof
2024-06-17
sched : offload_op also requires supports_op (#7977)
slaren
2024-06-17
fix: divide 0 exception in mamba (#7932)
Frank Mai
2024-06-17
Implement non-mapped async IO for CUDA on Windows. (#7896)
Markus Tavenrath
2024-06-17
rpc : fix load/store misaligned addresses (#7948)
Georgi Gerganov
2024-06-17
gguf-dump.py: add --markdown dump output (#7853)
Brian
2024-06-17
[SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7...
Neo Zhang
2024-06-17
Add support for sqrt on CUDA (#7953)
Calvin Laurenson
2024-06-16
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
Georgi Gerganov
2024-06-16
ggml : fix and optimize ppc64le (ggml/849)
Hong Bo PENG
2024-06-16
ggml : remove duplicate include of ggml-common.h (ggml/853)
Daniel Bevenius
2024-06-16
flake.lock: Update (#7951)
Georgi Gerganov
2024-06-16
unicode : avoid char32_t (#7957)
Georgi Gerganov
2024-06-16
readme : update UI list [no ci] (#7958)
hopkins385
2024-06-16
ggml : fix handling of zero blocks in IQ quants (#7955)
Georgi Gerganov
2024-06-16
github : update pr template
Georgi Gerganov
2024-06-16
Vulkan Shader Refactor, Memory Debugging Option (#7947)
0cc4m
2024-06-15
Add `cvector-generator` example (#7514)
Xuan Son Nguyen
2024-06-15
[SYCL] remove global variables (#7710)
Meng, Hengyu
2024-06-14
ci : fix macos x86 build (#7940)
olexiyb
2024-06-14
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
Johannes Gäßler
2024-06-14
metal : utilize max shared memory for mul_mat_id (#7935)
Georgi Gerganov
2024-06-14
llama-bench : fix RPC indication (#7936)
Radoslav Gerganov
2024-06-14
llama : more checks before assuming FIM tokens (#7644)
Sigbjørn Skjæret
2024-06-14
convert : add Poro-34B-chat tokenizer support (#7713)
Elaine
2024-06-13
rpc : fix ggml_backend_rpc_supports_buft() (#7918)
Radoslav Gerganov
2024-06-13
readme : Remove outdated instructions from README.md (#7914) [no ci]
Galunid
2024-06-13
move BLAS to a separate backend (#6210)
slaren
2024-06-13
`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...
Olivier Chafik
2024-06-12
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
Johannes Gäßler
2024-06-12
tests : add non-cont unary tests (#7857)
Georgi Gerganov
2024-06-12
ggml : improve ggml_is_contiguous logic (#7856)
Georgi Gerganov
2024-06-12
server : restore numeric prompts (#7883)
Georgi Gerganov
2024-06-12
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894)
Meng, Hengyu
2024-06-12
Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci]
Patrice Ferlet
2024-06-11
vulkan: select only one device for single gpu with multiple drivers (#7582)
k.h.lai
2024-06-11
Update Vulkan RoPE implementation (#7818)
0cc4m
2024-06-12
fix broken link in pr template (#7880) [no ci]
Deven Mistry
2024-06-11
github: move PR template to .github/ root (#7868)
Brian
2024-06-11
llama-bench: more compact markdown tables (#7879)
Johannes Gäßler
2024-06-11
tests : check the Python version (#7872)
Georgi Gerganov
2024-06-11
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)
Johannes Gäßler
2024-06-11
fix CUDA CI by using a windows-2019 image (#7861)
slaren
2024-06-11
json: refine constraint for whitespace to avoid runaways yet allow pretty pri...
Olivier Chafik
2024-06-11
`json`: document schema conversion in GBNF readme, align manual grammar examp...
Olivier Chafik
2024-06-10
cmake : fix CMake requirement for CUDA (#7821)
Jared Van Bortel
2024-06-10
ci : try win-2019 on server windows test (#7854)
slaren
[next]