ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-03-18	mpt : implement backwards compatiblity with duped output tensor (#6139)	Jared Van Bortel
2024-03-18	backend : offload large batches to GPU (#6083)	slaren
2024-03-15	llama : fix Baichuan2 13B (#6092)	slaren
2024-03-15	llama : add support for control vectors (#5970)	Theia Vogel
2024-03-15	llama : add Command-R support (#6033)	Andrew Canis
2024-03-15	fix set main gpu error (#6073)	Neo Zhang Jianyu
2024-03-15	llama : add Orion chat template (#6066)	Xuan Son Nguyen
2024-03-14	llama : fix integer overflow during quantization (#6063)	Georgi Gerganov
2024-03-14	llama : support models without vocabulary (#5798)	Michael Podvitskiy
2024-03-14	llama : fix typo	Georgi Gerganov
2024-03-14	llama : optimize defrag moves + fix fragmentation calculation (#6037)	Michael Podvitskiy
2024-03-13	llama : add pipeline parallelism support (#6017)	slaren
2024-03-11	grammar : fix unnecessarily retained pointer to rules (#6003)	gliptic
2024-03-11	llama : more consistent names of count variables (#5994)	Georgi Gerganov
2024-03-11	llama : refactor unicode stuff (#5992)	Georgi Gerganov
2024-03-11	ggml, ci : Windows ARM runner and build fixes (#5979)	Michael Podvitskiy
2024-03-11	llama : fix F16/F32 downcast + improve names (#5980)	Georgi Gerganov
2024-03-10	llama : add support for GritLM (#5959)	DAN™
2024-03-09	perplexity : support using multiple sequences to allow larger batch sizes (#5...	slaren
2024-03-09	ggml : remove old quantization functions (#5942)	Georgi Gerganov
2024-03-08	llama : support Mamba Selective State Space Models (#5328)	compilade
2024-03-08	llama : fix quantization of shared token_embd (#5944)	compilade
2024-03-08	llama : assume tied weights if lm_head/output weights is missing (#5824)	Don Mahurin
2024-03-07	Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918)	Neo Zhang Jianyu
2024-03-07	server : refactor (#5882)	Georgi Gerganov
2024-03-07	[SYCL] fix error when set main gpu to non-zero (#5901)	Neo Zhang Jianyu
2024-03-05	Vulkan Improvements (#5835)	0cc4m
2024-03-04	llama : fix embeddings (#5796)	Georgi Gerganov
2024-03-04	add alias for chat template (#5858)	Xuan Son Nguyen
2024-03-03	llama : allow for user specified embedding pooling type (#5849)	Douglas Hanley
2024-03-03	llama : fix llama_copy_state_data with fragmented KV cache (#5840)	compilade
2024-03-02	llama : add abort_callback to interrupt computation (#5409)	Michael Podvitskiy
2024-03-02	llama : refactor internal quantization functions (#5830)	Xuan Son Nguyen
2024-03-02	llama : fix segfault from unknown model arch name (#5820)	compilade
2024-03-02	Support multiple GPUs (split mode) on SYCL backend (#5806)	Neo Zhang Jianyu
2024-03-01	llama : add StarCoder2 support (#5795)	Sourab Mangrulkar
2024-03-01	llama : cleanup unused mmq flags (#5772)	Pierrick Hymbert
2024-03-01	unicode : switch to multimap based nfd_map (#5799)	Douglas Hanley
2024-02-29	llama : constified `llama_set_state_data`'s `src` (#5774)	Marcus Dunn
2024-02-28	llama : remove deprecated API (#5770)	Georgi Gerganov
2024-02-28	llama : fix non-quantization of expert gating tensors (#5754)	compilade
2024-02-28	llama : improve BERT tokenization (#5740)	Douglas Hanley
2024-02-27	IQ4_XS: a 4.25 bpw quantization (#5747)	Kawrakow
2024-02-27	llama : fix defrag bugs + add parameter (#5735)	Georgi Gerganov
2024-02-26	Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range...	Kawrakow
2024-02-26	[SYCL] Add support for soft_max ALiBi (#5639)	AidanBeltonS
2024-02-26	llama : fix Gemma rope type (#5691)	Georgi Gerganov
2024-02-25	llama : refactor k-shift implementation + KV defragmentation (#5691)	Georgi Gerganov
2024-02-25	code : normalize enum names (#5697)	Georgi Gerganov
2024-02-24	IQ3_S: a much better alternative to Q3_K (#5676)	Kawrakow