ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-04-13	model: support arch `DbrxForCausalLM` (#6515)	Pierrick Hymbert
2024-04-12	llama : add gguf_remove_key + remove split meta during quantize (#6591)	jiez
2024-04-12	Correct free memory and total memory. (#6630)	MasterYi1024
2024-04-11	Optimization: eliminate addition of redundant stacks when advancing grammar. ...	Clint Herron
2024-04-11	grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses...	Olivier Chafik
2024-04-11	eval-callback: Example how to use eval callback for debugging (#6576)	Pierrick Hymbert
2024-04-10	llama : add model types for mixtral (#6589)	slaren
2024-04-09	BERT tokenizer fixes (#6498)	Jared Van Bortel
2024-04-09	llama : add Command R Plus support (#6491)	Carolinabanana
2024-04-08	llama : fix attention layer count sanity check (#6550)	Georgi Gerganov
2024-04-08	quantize : fix precedence of cli args (#6541)	Georgi Gerganov
2024-04-08	llama : support negative ith in llama_get_ API (#6519)	Rick G
2024-04-08	llama : save and restore kv cache for single seq id (#6341)	Jan Boon
2024-04-05	gguf.py : add licence and version to gguf writer (#6504)	Brian
2024-04-04	examples : add GBNF validator program (#5948)	Clint Herron
2024-04-03	llama : add SEA-LION support (#6448)	bryanSwk
2024-04-03	Add OpenChat, Alpaca, Vicuna chat templates (#6397)	kaizau
2024-04-03	ggml : mul_mat_id use the same tensor for all the experts (#6387)	slaren
2024-03-29	Vulkan k-quant mmq and ggml-backend offload functionality (#6155)	0cc4m
2024-03-29	[Model] Add support for xverse (#6301)	hxer7963
2024-03-29	llama : remove redundant reshape in build_kv_store (#6369)	Daniel Bevenius
2024-03-28	llama : fix command-r inference when omitting outputs (#6367)	compilade
2024-03-26	wpm : portable unicode tolower (#6305)	Jared Van Bortel
2024-03-26	llama : greatly reduce output buffer memory usage (#6122)	compilade
2024-03-26	IQ1_M: 1.75 bpw quantization (#6302)	Kawrakow
2024-03-26	quantize : be able to override metadata by key (#6321)	Kawrakow
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren
2024-03-24	[SYCL] offload op (#6217)	Meng, Hengyu
2024-03-23	use _wfopen instead of fopen on Windows (#6248)	Jared Van Bortel
2024-03-23	common: llama_load_model_from_url split support (#6192)	Pierrick Hymbert
2024-03-23	llama : add grok-1 support (#6204)	Julius Arkenberg
2024-03-22	quantize: options for output and token embedding tensors qtype (#6239)	Kawrakow
2024-03-22	llama_model_loader: support multiple split/shard GGUFs (#6187)	Pierrick Hymbert
2024-03-22	llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)	Nexesenex
2024-03-22	metal : pad n_ctx by 32 (#6177)	Georgi Gerganov
2024-03-18	mpt : implement backwards compatiblity with duped output tensor (#6139)	Jared Van Bortel
2024-03-18	backend : offload large batches to GPU (#6083)	slaren
2024-03-15	llama : fix Baichuan2 13B (#6092)	slaren
2024-03-15	llama : add support for control vectors (#5970)	Theia Vogel
2024-03-15	llama : add Command-R support (#6033)	Andrew Canis
2024-03-15	fix set main gpu error (#6073)	Neo Zhang Jianyu
2024-03-15	llama : add Orion chat template (#6066)	Xuan Son Nguyen
2024-03-14	llama : fix integer overflow during quantization (#6063)	Georgi Gerganov
2024-03-14	llama : support models without vocabulary (#5798)	Michael Podvitskiy
2024-03-14	llama : fix typo	Georgi Gerganov
2024-03-14	llama : optimize defrag moves + fix fragmentation calculation (#6037)	Michael Podvitskiy
2024-03-13	llama : add pipeline parallelism support (#6017)	slaren
2024-03-11	grammar : fix unnecessarily retained pointer to rules (#6003)	gliptic
2024-03-11	llama : more consistent names of count variables (#5994)	Georgi Gerganov
2024-03-11	llama : refactor unicode stuff (#5992)	Georgi Gerganov