index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
Makefile
Age
Commit message (
Expand
)
Author
2024-06-22
iqk_mul_mat for llama.cpp
Iwan Kawrakow
2024-06-21
JSON Schema to GBNF integration tests (#7790)
Clint Herron
2024-06-18
Allow compiling with CUDA without CUDA runtime installed (#7989)
Ulrich Drepper
2024-06-16
Vulkan Shader Refactor, Memory Debugging Option (#7947)
0cc4m
2024-06-15
Add `cvector-generator` example (#7514)
Xuan Son Nguyen
2024-06-13
move BLAS to a separate backend (#6210)
slaren
2024-06-13
`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...
Olivier Chafik
2024-06-05
CUDA: refactor mmq, dmmv, mmvq (#7716)
Johannes Gäßler
2024-06-04
ggml : remove OpenCL (#7735)
Georgi Gerganov
2024-06-04
llama : remove beam search (#7736)
Georgi Gerganov
2024-06-03
llama : offload to RPC in addition to other backends (#7640)
Radoslav Gerganov
2024-06-03
ggml : use OpenMP as a thread pool (#7606)
Masaya, Kato
2024-06-03
make: fix debug options not being applied to NVCC (#7714)
Johannes Gäßler
2024-06-01
server : new UI (#7633)
Yazan Agha-Schrader
2024-06-01
CUDA: quantized KV support for FA vec (#7527)
Johannes Gäßler
2024-05-31
Improve HIP compatibility (#7672)
Daniele
2024-05-27
make: add --device-debug to NVCC debug flags (#7542)
Johannes Gäßler
2024-05-23
ggml : drop support for QK_K=64 (#7473)
Georgi Gerganov
2024-05-20
ggml : add loongarch lsx and lasx support (#6454)
junchao-loongson
2024-05-20
llama : remove MPI backend (#7395)
slaren
2024-05-17
ROCm: use native CMake HIP support (#5966)
Gavin Zhao
2024-05-08
Introduction of CUDA Graphs to LLama.cpp (#6766)
agray3
2024-05-04
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
Georgi Gerganov
2024-04-29
llama : fix BPE pre-tokenization (#6920)
Georgi Gerganov
2024-04-29
make : change GNU make default CXX from g++ to c++ (#6966)
Przemysław Pawełczyk
2024-04-26
quantize: add imatrix and dataset metadata in GGUF (#6658)
Pierrick Hymbert
2024-04-22
llamafile : improve sgemm.cpp (#6796)
Justine Tunney
2024-04-21
`build`: generate hex dump of server assets during build (#6661)
Olivier Chafik
2024-04-21
llama : add option to render special/control tokens (#6807)
Georgi Gerganov
2024-04-17
llamafile : tmp disable + build sgemm.o when needed (#6716)
Georgi Gerganov
2024-04-16
ggml : fix llamafile sgemm wdata offsets (#6710)
Georgi Gerganov
2024-04-16
ggml : add llamafile sgemm (#6414)
Justine Tunney
2024-04-15
`main`: add --json-schema / -j flag (#6659)
Olivier Chafik
2024-04-11
Refactor Error Handling for CUDA (#6575)
Nikolas
2024-04-11
eval-callback: Example how to use eval callback for debugging (#6576)
Pierrick Hymbert
2024-04-06
Tests: Added integration tests for GBNF parser (#6472)
Clint Herron
2024-04-04
examples : add GBNF validator program (#5948)
Clint Herron
2024-03-27
make : whitespace
Georgi Gerganov
2024-03-26
wpm : portable unicode tolower (#6305)
Jared Van Bortel
2024-03-26
cuda : rename build flag to LLAMA_CUDA (#6299)
slaren
2024-03-25
cuda : refactor into multiple files (#6269)
slaren
2024-03-25
examples : add "retrieval" (#6193)
Minsoo Cheong
2024-03-23
split: add gguf-split in the make build target (#6262)
Pierrick Hymbert
2024-03-23
lookup: complement data from context with general text statistics (#5479)
Johannes Gäßler
2024-03-22
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)
slaren
2024-03-21
json-schema-to-grammar improvements (+ added to server) (#5978)
Olivier Chafik
2024-03-19
gguf-split: split and merge gguf per batch of tensors (#6135)
Pierrick Hymbert
2024-03-17
common: llama_load_model_from_url using --model-url (#6098)
Pierrick Hymbert
2024-03-15
make : ggml-metal.o depends on ggml.h
Georgi Gerganov
2024-03-14
metal : build metallib + fix embed path (#6015)
Georgi Gerganov
[next]