ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-03-03	llama : allow for user specified embedding pooling type (#5849)	Douglas Hanley
2024-03-02	llama : add abort_callback to interrupt computation (#5409)	Michael Podvitskiy
2024-03-01	llama : cleanup unused mmq flags (#5772)	Pierrick Hymbert
2024-02-29	llama : constified `llama_set_state_data`'s `src` (#5774)	Marcus Dunn
2024-02-28	llama : remove deprecated API (#5770)	Georgi Gerganov
2024-02-27	IQ4_XS: a 4.25 bpw quantization (#5747)	Kawrakow
2024-02-27	llama : fix defrag bugs + add parameter (#5735)	Georgi Gerganov
2024-02-26	Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range...	Kawrakow
2024-02-25	llama : refactor k-shift implementation + KV defragmentation (#5691)	Georgi Gerganov
2024-02-25	code : normalize enum names (#5697)	Georgi Gerganov
2024-02-24	IQ3_S: a much better alternative to Q3_K (#5676)	Kawrakow
2024-02-22	Add docs for llama_chat_apply_template (#5645)	Xuan Son Nguyen
2024-02-21	IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)	Kawrakow
2024-02-19	llama : add llama_chat_apply_template() (#5538)	Xuan Son Nguyen
2024-02-18	1.5 bit quantization (#5453)	Kawrakow
2024-02-16	ggml : add numa options (#5377)	bmwl
2024-02-15	Use correct type of pooling for embedding models (#5500)	Douglas Hanley
2024-02-13	llama : support batched embeddings (#5466)	Douglas Hanley
2024-02-11	Add support for BERT embedding models (#5423)	Douglas Hanley
2024-02-03	YaRN : store rope scaling type as int32_t in memory (#5285)	Jared Van Bortel
2024-01-31	llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)	Georgi Gerganov
2024-01-30	SOTA 3-bit quants (#5196)	Kawrakow
2024-01-29	Nomic Vulkan backend (#4456)	Jared Van Bortel
2024-01-28	ggml : add Vulkan backend (#2059)	0cc4m
2024-01-28	ggml : add unified SYCL backend for Intel GPUs (#2690)	Abhilash Majumder
2024-01-25	llama : dynamic temperature sampling (#4972)	l3utterfly
2024-01-22	llama : add Q3_K_XS (#5060)	Kawrakow
2024-01-17	backend : add eval callback (#4935)	Georgi Gerganov
2024-01-15	llama : apply classifier-free guidance to logits directly (#4951)	David Friehs
2024-01-14	2-bit quantizations (#4897)	Kawrakow
2024-01-13	llama : minimize size used for state save/load (#4820)	David Friehs
2024-01-12	llama : ggml-backend integration (#4766)	slaren
2024-01-11	llama : restore intended k-quants mixes for MoE models (#4872)	Kawrakow
2024-01-11	ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)	Kawrakow
2024-01-08	SOTA 2-bit quants (#4773)	Kawrakow
2024-01-08	main : add self-extend support (#4815)	Georgi Gerganov
2024-01-08	examples : add passkey test (#3856)	Georgi Gerganov
2024-01-02	llama : replace all API facing `int`'s with `int32_t` (#4577)	Marcus Dunn
2023-12-22	llama : add ability to cancel model loading (#4462)	crasm
2023-12-21	llama : allow getting n_batch from llama_context in c api (#4540)	Marcus Dunn
2023-12-16	lora : add support for non-llama models (#3333)	slaren
2023-12-12	llama : document logits_all deprecation (#4418)	crasm
2023-12-07	llama : per-layer KV cache + quantum K cache (#4309)	Georgi Gerganov
2023-12-05	llama : allow overriding GGUF metadata when loading model (#4092)	Kerfuffle
2023-11-25	Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189)	crasm
2023-11-23	llama : KV cache view API + better KV cache management (#4170)	Georgi Gerganov
2023-11-17	llama : add functions to get the model's metadata (#4013)	slaren
2023-11-16	Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)	Kerfuffle
2023-11-03	common : YAYF (yet another YARN fix) (#3925)	Georgi Gerganov
2023-11-01	llama : implement YaRN RoPE scaling (#2268)	cebtenzzre