ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-04-16	ggml : add llamafile sgemm (#6414)	Justine Tunney
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren
2024-03-21	Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)	Kawrakow
2024-03-18	backend : offload large batches to GPU (#6083)	slaren
2024-03-15	llama-bench : use random tokens to improve accuracy with mixtral (#6069)	slaren
2024-03-14	gguf : fix resource leaks (#6061)	Steve Grubb
2024-03-13	llama : add pipeline parallelism support (#6017)	slaren
2024-03-07	llama-bench : add embeddings option (#5924)	Georgi Gerganov
2024-03-02	Support multiple GPUs (split mode) on SYCL backend (#5806)	Neo Zhang Jianyu
2024-03-01	llama : cleanup unused mmq flags (#5772)	Pierrick Hymbert
2024-02-25	code : normalize enum names (#5697)	Georgi Gerganov
2024-02-16	ggml : add numa options (#5377)	bmwl
2024-02-03	refactor : switch to emplace_back to avoid extra object (#5291)	Michael Klimenko
2024-02-01	add --no-mmap in llama-bench (#5257)	Neo Zhang Jianyu
2024-01-31	llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)	Georgi Gerganov
2024-01-30	kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)	Jared Van Bortel
2024-01-28	ggml : add Vulkan backend (#2059)	0cc4m
2024-01-12	llama : ggml-backend integration (#4766)	slaren
2024-01-07	llama-bench : add no-kv-offload parameter (#4812)	slaren
2023-12-07	llama : per-layer KV cache + quantum K cache (#4309)	Georgi Gerganov
2023-11-02	build : link against build info instead of compiling against it (#3879)	cebtenzzre
2023-10-29	Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)	Kerfuffle
2023-10-23	llama : remove token functions with `context` args in favor of `model` (#3720)	Marcus Dunn
2023-09-28	build : enable more non-default compiler warnings (#3200)	Cebtenzzre
2023-09-28	llama.cpp : split llama_context_params into model and context params (#3301)	slaren
2023-09-28	llama : custom attention mask + parallel decoding + no context swaps (#3228)	Georgi Gerganov
2023-09-27	metal : reusing llama.cpp logging (#3152)	Rickard Hallerbäck
2023-09-15	sync : ggml (Metal F32 support + reduce ggml-alloc size) (#3192)	Georgi Gerganov
2023-09-07	llama-bench : use two tokens in the warmup run for prompt evals (#3059)	slaren
2023-09-05	examples : replace fprintf to stdout with printf (#3017)	Cebtenzzre
2023-09-04	llama-bench : make cpp file non-executable (#2999)	Cebtenzzre
2023-08-28	llama-bench : set locale to utf8 (#2832)	slaren
2023-08-25	llama-bench : add model sizes (#2771)	slaren
2023-08-25	ROCm Port (#1087)	Henri Vasserman
2023-08-22	llama-bench : minor fixes (#2695)	slaren
2023-08-21	gguf : new file format with flexible meta data (beta) (#2398)	Georgi Gerganov
2023-08-18	llama : add benchmark example (#2626)	slaren