ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-03-29	Vulkan k-quant mmq and ggml-backend offload functionality (#6155)	0cc4m
2024-03-29	[Model] Add support for xverse (#6301)	hxer7963
2024-03-29	llama : remove redundant reshape in build_kv_store (#6369)	Daniel Bevenius
2024-03-28	llama : fix command-r inference when omitting outputs (#6367)	compilade
2024-03-26	wpm : portable unicode tolower (#6305)	Jared Van Bortel
2024-03-26	llama : greatly reduce output buffer memory usage (#6122)	compilade
2024-03-26	IQ1_M: 1.75 bpw quantization (#6302)	Kawrakow
2024-03-26	quantize : be able to override metadata by key (#6321)	Kawrakow
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren
2024-03-24	[SYCL] offload op (#6217)	Meng, Hengyu
2024-03-23	use _wfopen instead of fopen on Windows (#6248)	Jared Van Bortel
2024-03-23	common: llama_load_model_from_url split support (#6192)	Pierrick Hymbert
2024-03-23	llama : add grok-1 support (#6204)	Julius Arkenberg
2024-03-22	quantize: options for output and token embedding tensors qtype (#6239)	Kawrakow
2024-03-22	llama_model_loader: support multiple split/shard GGUFs (#6187)	Pierrick Hymbert
2024-03-22	llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)	Nexesenex
2024-03-22	metal : pad n_ctx by 32 (#6177)	Georgi Gerganov
2024-03-18	mpt : implement backwards compatiblity with duped output tensor (#6139)	Jared Van Bortel
2024-03-18	backend : offload large batches to GPU (#6083)	slaren
2024-03-15	llama : fix Baichuan2 13B (#6092)	slaren
2024-03-15	llama : add support for control vectors (#5970)	Theia Vogel
2024-03-15	llama : add Command-R support (#6033)	Andrew Canis
2024-03-15	fix set main gpu error (#6073)	Neo Zhang Jianyu
2024-03-15	llama : add Orion chat template (#6066)	Xuan Son Nguyen
2024-03-14	llama : fix integer overflow during quantization (#6063)	Georgi Gerganov
2024-03-14	llama : support models without vocabulary (#5798)	Michael Podvitskiy
2024-03-14	llama : fix typo	Georgi Gerganov
2024-03-14	llama : optimize defrag moves + fix fragmentation calculation (#6037)	Michael Podvitskiy
2024-03-13	llama : add pipeline parallelism support (#6017)	slaren
2024-03-11	grammar : fix unnecessarily retained pointer to rules (#6003)	gliptic
2024-03-11	llama : more consistent names of count variables (#5994)	Georgi Gerganov
2024-03-11	llama : refactor unicode stuff (#5992)	Georgi Gerganov
2024-03-11	ggml, ci : Windows ARM runner and build fixes (#5979)	Michael Podvitskiy
2024-03-11	llama : fix F16/F32 downcast + improve names (#5980)	Georgi Gerganov
2024-03-10	llama : add support for GritLM (#5959)	DAN™
2024-03-09	perplexity : support using multiple sequences to allow larger batch sizes (#5...	slaren
2024-03-09	ggml : remove old quantization functions (#5942)	Georgi Gerganov
2024-03-08	llama : support Mamba Selective State Space Models (#5328)	compilade
2024-03-08	llama : fix quantization of shared token_embd (#5944)	compilade
2024-03-08	llama : assume tied weights if lm_head/output weights is missing (#5824)	Don Mahurin
2024-03-07	Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918)	Neo Zhang Jianyu
2024-03-07	server : refactor (#5882)	Georgi Gerganov
2024-03-07	[SYCL] fix error when set main gpu to non-zero (#5901)	Neo Zhang Jianyu
2024-03-05	Vulkan Improvements (#5835)	0cc4m
2024-03-04	llama : fix embeddings (#5796)	Georgi Gerganov
2024-03-04	add alias for chat template (#5858)	Xuan Son Nguyen
2024-03-03	llama : allow for user specified embedding pooling type (#5849)	Douglas Hanley
2024-03-03	llama : fix llama_copy_state_data with fragmented KV cache (#5840)	compilade
2024-03-02	llama : add abort_callback to interrupt computation (#5409)	Michael Podvitskiy
2024-03-02	llama : refactor internal quantization functions (#5830)	Xuan Son Nguyen