ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-04-25	llama : synchronize before get/set session data (#6911)	slaren
2024-04-25	llama : check that all the tensor data is in the model file (#6885)	slaren
2024-04-25	tests : minor bash stuff (#6902)	Georgi Gerganov
2024-04-25	quantize : add '--keep-split' to quantize model into shards (#6688)	jiez
2024-04-24	llama : add llama_get_pooling_type function (#6862)	Douglas Hanley
2024-04-24	Server: fix seed for multiple slots (#6835)	Johannes Gäßler
2024-04-24	llama : add phi 3 chat template (#6857)	Tristan Druyen
2024-04-24	llama : add phi3 support (#6852)	liuwei-git
2024-04-22	llama : fix typo in <\|im_end\|> token text (#6745)	Georgi Gerganov
2024-04-21	llama : add option to render special/control tokens (#6807)	Georgi Gerganov
2024-04-21	llama : add llama-3 chat template (#6751)	Wouter
2024-04-21	llama : support Llama 3 HF conversion (#6745)	Pedro Cuenca
2024-04-19	Implement the OLMo architecture (#6741)	nopperl
2024-04-18	ggml : group all experts in a single ggml_mul_mat_id (#6505)	slaren
2024-04-18	Qwen2 : assume tied weights if lm_head/output weights is missing (#6738)	Ren Xuancheng
2024-04-18	llama : fix compatibility with old 2 expert models (#6735)	slaren
2024-04-16	llama : make general.name optional (#6709)	Georgi Gerganov
2024-04-16	llama : add StableLM2 12B (#6635)	Ashish
2024-04-16	llama : add qwen2moe (#6074)	Shijie
2024-04-16	gguf : add special tokens metadata for FIM/Infill (#6689)	Daniel Bevenius
2024-04-15	llama : fix restoring the number of outputs from state files (#6687)	compilade
2024-04-14	llama : add missing kv clear in llama_beam_search (#6664)	David Renshaw
2024-04-14	Add Command R chat template (#6650)	Chao Jiang
2024-04-13	model: support arch `DbrxForCausalLM` (#6515)	Pierrick Hymbert
2024-04-12	llama : add gguf_remove_key + remove split meta during quantize (#6591)	jiez
2024-04-12	Correct free memory and total memory. (#6630)	MasterYi1024
2024-04-11	Optimization: eliminate addition of redundant stacks when advancing grammar. ...	Clint Herron
2024-04-11	grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses...	Olivier Chafik
2024-04-11	eval-callback: Example how to use eval callback for debugging (#6576)	Pierrick Hymbert
2024-04-10	llama : add model types for mixtral (#6589)	slaren
2024-04-09	BERT tokenizer fixes (#6498)	Jared Van Bortel
2024-04-09	llama : add Command R Plus support (#6491)	Carolinabanana
2024-04-08	llama : fix attention layer count sanity check (#6550)	Georgi Gerganov
2024-04-08	quantize : fix precedence of cli args (#6541)	Georgi Gerganov
2024-04-08	llama : support negative ith in llama_get_ API (#6519)	Rick G
2024-04-08	llama : save and restore kv cache for single seq id (#6341)	Jan Boon
2024-04-05	gguf.py : add licence and version to gguf writer (#6504)	Brian
2024-04-04	examples : add GBNF validator program (#5948)	Clint Herron
2024-04-03	llama : add SEA-LION support (#6448)	bryanSwk
2024-04-03	Add OpenChat, Alpaca, Vicuna chat templates (#6397)	kaizau
2024-04-03	ggml : mul_mat_id use the same tensor for all the experts (#6387)	slaren
2024-03-29	Vulkan k-quant mmq and ggml-backend offload functionality (#6155)	0cc4m
2024-03-29	[Model] Add support for xverse (#6301)	hxer7963
2024-03-29	llama : remove redundant reshape in build_kv_store (#6369)	Daniel Bevenius
2024-03-28	llama : fix command-r inference when omitting outputs (#6367)	compilade
2024-03-26	wpm : portable unicode tolower (#6305)	Jared Van Bortel
2024-03-26	llama : greatly reduce output buffer memory usage (#6122)	compilade
2024-03-26	IQ1_M: 1.75 bpw quantization (#6302)	Kawrakow
2024-03-26	quantize : be able to override metadata by key (#6321)	Kawrakow
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren