index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-02-10
ggml : add abort_callback for cpu backend (ggml/725)
Michael Podvitskiy
2024-02-09
vulkan: Set limit for task concurrency (#5427)
Neuman Vong
2024-02-09
llava : add requirements.txt and update README.md (#5428)
Daniel Bevenius
2024-02-09
server : fix prompt caching for repeated prompts (#5420)
Riley Stewart
2024-02-09
llama : do not cap thread count when MoE on CPU (#5419)
Paul Tsochantaris
2024-02-09
readme : add JavaScript/Wasm repo (#5415)
Marko Tasic
2024-02-09
ggml : fix `error C2078: too many initializers` for MSVC ARM64 (#5404)
Michael Podvitskiy
2024-02-09
Fix Vulkan crash on APUs with very little device memory (#5424)
0cc4m
2024-02-08
CUDA: more warps for mmvq on NVIDIA (#5394)
Johannes Gäßler
2024-02-08
llama : do not print "offloading layers" message in CPU-only builds (#5416)
slaren
2024-02-08
Fix f16_sycl cpy call from Arc (#5411)
Abhilash Majumder
2024-02-08
llava : add missing .py, and fix paths in README.md (#5414)
Daniel Bevenius
2024-02-08
fix trailing whitespace (#5407)
Johannes Gäßler
2024-02-08
llama : fix MiniCPM (#5392)
runfuture
2024-02-08
llava: fix typo/formatting in README.md (#5405)
Daniel Bevenius
2024-02-08
sampling: fix top_k <= 0 (#5388)
Johannes Gäßler
2024-02-08
tests : .gitignore obj files
Georgi Gerganov
2024-02-07
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393)
Michael Podvitskiy
2024-02-07
fix typo in readme (#5399)
Ebey Abraham
2024-02-07
Add Ava in the list of llama.cpp UIs (#4362)
Kamil Tomšík
2024-02-07
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386)
Johannes Gäßler
2024-02-07
[SYCL] update install make by w64devkit (#5297)
Neo Zhang Jianyu
2024-02-07
llava-cli : always tokenize special tokens (#5382)
Xiao-Yong Jin
2024-02-07
Basic Vulkan Multi-GPU implementation (#5321)
0cc4m
2024-02-07
readme : modernize (#5379)
Eve
2024-02-07
readme : update ui list (#5354)
Ben Williams
2024-02-07
llama : add MiniCPM support (#5346)
runfuture
2024-02-07
server : update `/props` with "total_slots" value (#5373)
Justin Parker
2024-02-06
convert : fix TypeError on GPT-2 vocab.json (#5288)
Sang-Kil Park
2024-02-06
server : remove model.json endpoint (#5371)
Alexey Parfenov
2024-02-06
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370)
Johannes Gäßler
2024-02-06
Update README.md (#5366)
Kawrakow
2024-02-06
Slight quantization improvement for Q4_K and Q5_K (#5361)
Kawrakow
2024-02-06
readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362)
BarfingLemurs
2024-02-06
CUDA: mul_mat_vec_q for batch sizes > 1 (#5351)
Johannes Gäßler
2024-02-06
server : include total "num_slots" in props endpoint (#5349)
Justin Parker
2024-02-06
server : add `dynatemp_range` and `dynatemp_exponent` (#5352)
Michael Coppola
2024-02-06
server : various fixes for the prompt field in /completion (#5300)
Niall Coates
2024-02-06
py : handle byte tokens in `get_token_type` (#5341)
Georgi Gerganov
2024-02-05
make: Use ccache for faster compilation (#5318)
Johannes Gäßler
2024-02-05
README: updated introduction (#5343)
Johannes Gäßler
2024-02-05
ggml : make use of ggml-quants.h possible in C++ code (#5338)
Kawrakow
2024-02-05
ggml : avoid duplicating function calls using MIN/MAX macros (#5325)
Dr. Tom Murphy VII Ph.D
2024-02-05
iq3_xxs: quards for the no-imatrix situation (#5334)
Kawrakow
2024-02-05
py : fix internlm2-hf convert to gguf (#5305)
Guoteng
2024-02-05
iq2_xxs: tune quantization (#5320)
Kawrakow
2024-02-05
server : allow to get default generation settings for completion (#5307)
Alexey Parfenov
2024-02-05
common : add dynamic temperature parameters to main example cli (#5295)
l3utterfly
2024-02-05
scripts : fix typos, cleanup (#5303)
Georgi Gerganov
2024-02-05
scripts : add non-interactive server-llm.sh (#5303)
Нияз Гарифзянов
[next]