summaryrefslogtreecommitdiff
path: root/Makefile
AgeCommit message (Expand)Author
2024-06-22iqk_mul_mat for llama.cppIwan Kawrakow
2024-06-21JSON Schema to GBNF integration tests (#7790)Clint Herron
2024-06-18Allow compiling with CUDA without CUDA runtime installed (#7989)Ulrich Drepper
2024-06-16Vulkan Shader Refactor, Memory Debugging Option (#7947)0cc4m
2024-06-15Add `cvector-generator` example (#7514)Xuan Son Nguyen
2024-06-13move BLAS to a separate backend (#6210)slaren
2024-06-13`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...Olivier Chafik
2024-06-05CUDA: refactor mmq, dmmv, mmvq (#7716)Johannes Gäßler
2024-06-04ggml : remove OpenCL (#7735)Georgi Gerganov
2024-06-04llama : remove beam search (#7736)Georgi Gerganov
2024-06-03llama : offload to RPC in addition to other backends (#7640)Radoslav Gerganov
2024-06-03ggml : use OpenMP as a thread pool (#7606)Masaya, Kato
2024-06-03make: fix debug options not being applied to NVCC (#7714)Johannes Gäßler
2024-06-01server : new UI (#7633)Yazan Agha-Schrader
2024-06-01CUDA: quantized KV support for FA vec (#7527)Johannes Gäßler
2024-05-31Improve HIP compatibility (#7672)Daniele
2024-05-27make: add --device-debug to NVCC debug flags (#7542)Johannes Gäßler
2024-05-23ggml : drop support for QK_K=64 (#7473)Georgi Gerganov
2024-05-20ggml : add loongarch lsx and lasx support (#6454)junchao-loongson
2024-05-20llama : remove MPI backend (#7395)slaren
2024-05-17ROCm: use native CMake HIP support (#5966)Gavin Zhao
2024-05-08Introduction of CUDA Graphs to LLama.cpp (#6766)agray3
2024-05-04tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)Georgi Gerganov
2024-04-29llama : fix BPE pre-tokenization (#6920)Georgi Gerganov
2024-04-29make : change GNU make default CXX from g++ to c++ (#6966)Przemysław Pawełczyk
2024-04-26quantize: add imatrix and dataset metadata in GGUF (#6658)Pierrick Hymbert
2024-04-22llamafile : improve sgemm.cpp (#6796)Justine Tunney
2024-04-21`build`: generate hex dump of server assets during build (#6661)Olivier Chafik
2024-04-21llama : add option to render special/control tokens (#6807)Georgi Gerganov
2024-04-17llamafile : tmp disable + build sgemm.o when needed (#6716)Georgi Gerganov
2024-04-16ggml : fix llamafile sgemm wdata offsets (#6710)Georgi Gerganov
2024-04-16ggml : add llamafile sgemm (#6414)Justine Tunney
2024-04-15`main`: add --json-schema / -j flag (#6659)Olivier Chafik
2024-04-11Refactor Error Handling for CUDA (#6575)Nikolas
2024-04-11eval-callback: Example how to use eval callback for debugging (#6576)Pierrick Hymbert
2024-04-06Tests: Added integration tests for GBNF parser (#6472)Clint Herron
2024-04-04examples : add GBNF validator program (#5948)Clint Herron
2024-03-27make : whitespaceGeorgi Gerganov
2024-03-26wpm : portable unicode tolower (#6305)Jared Van Bortel
2024-03-26cuda : rename build flag to LLAMA_CUDA (#6299)slaren
2024-03-25cuda : refactor into multiple files (#6269)slaren
2024-03-25examples : add "retrieval" (#6193)Minsoo Cheong
2024-03-23split: add gguf-split in the make build target (#6262)Pierrick Hymbert
2024-03-23lookup: complement data from context with general text statistics (#5479)Johannes Gäßler
2024-03-22cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)slaren
2024-03-21json-schema-to-grammar improvements (+ added to server) (#5978)Olivier Chafik
2024-03-19gguf-split: split and merge gguf per batch of tensors (#6135)Pierrick Hymbert
2024-03-17common: llama_load_model_from_url using --model-url (#6098)Pierrick Hymbert
2024-03-15make : ggml-metal.o depends on ggml.hGeorgi Gerganov
2024-03-14metal : build metallib + fix embed path (#6015)Georgi Gerganov