ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-06-22	iqk_mul_mat for llama.cpp	Iwan Kawrakow
2024-06-21	JSON Schema to GBNF integration tests (#7790)	Clint Herron
2024-06-18	Allow compiling with CUDA without CUDA runtime installed (#7989)	Ulrich Drepper
2024-06-16	Vulkan Shader Refactor, Memory Debugging Option (#7947)	0cc4m
2024-06-15	Add `cvector-generator` example (#7514)	Xuan Son Nguyen
2024-06-13	move BLAS to a separate backend (#6210)	slaren
2024-06-13	`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...	Olivier Chafik
2024-06-05	CUDA: refactor mmq, dmmv, mmvq (#7716)	Johannes Gäßler
2024-06-04	ggml : remove OpenCL (#7735)	Georgi Gerganov
2024-06-04	llama : remove beam search (#7736)	Georgi Gerganov
2024-06-03	llama : offload to RPC in addition to other backends (#7640)	Radoslav Gerganov
2024-06-03	ggml : use OpenMP as a thread pool (#7606)	Masaya, Kato
2024-06-03	make: fix debug options not being applied to NVCC (#7714)	Johannes Gäßler
2024-06-01	server : new UI (#7633)	Yazan Agha-Schrader
2024-06-01	CUDA: quantized KV support for FA vec (#7527)	Johannes Gäßler
2024-05-31	Improve HIP compatibility (#7672)	Daniele
2024-05-27	make: add --device-debug to NVCC debug flags (#7542)	Johannes Gäßler
2024-05-23	ggml : drop support for QK_K=64 (#7473)	Georgi Gerganov
2024-05-20	ggml : add loongarch lsx and lasx support (#6454)	junchao-loongson
2024-05-20	llama : remove MPI backend (#7395)	slaren
2024-05-17	ROCm: use native CMake HIP support (#5966)	Gavin Zhao
2024-05-08	Introduction of CUDA Graphs to LLama.cpp (#6766)	agray3
2024-05-04	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)	Georgi Gerganov
2024-04-29	llama : fix BPE pre-tokenization (#6920)	Georgi Gerganov
2024-04-29	make : change GNU make default CXX from g++ to c++ (#6966)	Przemysław Pawełczyk
2024-04-26	quantize: add imatrix and dataset metadata in GGUF (#6658)	Pierrick Hymbert
2024-04-22	llamafile : improve sgemm.cpp (#6796)	Justine Tunney
2024-04-21	`build`: generate hex dump of server assets during build (#6661)	Olivier Chafik
2024-04-21	llama : add option to render special/control tokens (#6807)	Georgi Gerganov
2024-04-17	llamafile : tmp disable + build sgemm.o when needed (#6716)	Georgi Gerganov
2024-04-16	ggml : fix llamafile sgemm wdata offsets (#6710)	Georgi Gerganov
2024-04-16	ggml : add llamafile sgemm (#6414)	Justine Tunney
2024-04-15	`main`: add --json-schema / -j flag (#6659)	Olivier Chafik
2024-04-11	Refactor Error Handling for CUDA (#6575)	Nikolas
2024-04-11	eval-callback: Example how to use eval callback for debugging (#6576)	Pierrick Hymbert
2024-04-06	Tests: Added integration tests for GBNF parser (#6472)	Clint Herron
2024-04-04	examples : add GBNF validator program (#5948)	Clint Herron
2024-03-27	make : whitespace	Georgi Gerganov
2024-03-26	wpm : portable unicode tolower (#6305)	Jared Van Bortel
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren
2024-03-25	cuda : refactor into multiple files (#6269)	slaren
2024-03-25	examples : add "retrieval" (#6193)	Minsoo Cheong
2024-03-23	split: add gguf-split in the make build target (#6262)	Pierrick Hymbert
2024-03-23	lookup: complement data from context with general text statistics (#5479)	Johannes Gäßler
2024-03-22	cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)	slaren
2024-03-21	json-schema-to-grammar improvements (+ added to server) (#5978)	Olivier Chafik
2024-03-19	gguf-split: split and merge gguf per batch of tensors (#6135)	Pierrick Hymbert
2024-03-17	common: llama_load_model_from_url using --model-url (#6098)	Pierrick Hymbert
2024-03-15	make : ggml-metal.o depends on ggml.h	Georgi Gerganov
2024-03-14	metal : build metallib + fix embed path (#6015)	Georgi Gerganov