ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-03-26	IQ1_M: 1.75 bpw quantization (#6302)	Kawrakow
2024-03-26	quantize : be able to override metadata by key (#6321)	Kawrakow
2024-03-26	embedding : adjust `n_ubatch` value (#6296)	Minsoo Cheong
2024-03-26	server : add `n_discard` parameter (#6300)	Jan Boon
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren
2024-03-25	Server: clean up OAI params parsing function (#6284)	Xuan Son Nguyen
2024-03-25	[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)	Neo Zhang Jianyu
2024-03-25	examples : add "retrieval" (#6193)	Minsoo Cheong
2024-03-24	imatrix : fix wname for mul_mat_id ops (#6271)	Georgi Gerganov
2024-03-24	sampling : deduplicated code for probability distribution access (#6240)	Minsoo Cheong
2024-03-23	common: llama_load_model_from_url split support (#6192)	Pierrick Hymbert
2024-03-23	server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (...	Pierrick Hymbert
2024-03-23	server: flush stdout after logging in both text and json layout (#6253)	Pierrick Hymbert
2024-03-23	lookup: complement data from context with general text statistics (#5479)	Johannes Gäßler
2024-03-22	convert-llama2c-to-ggml : enable conversion of GQA models (#6237)	fraxy-v
2024-03-22	quantize: options for output and token embedding tensors qtype (#6239)	Kawrakow
2024-03-22	llama_model_loader: support multiple split/shard GGUFs (#6187)	Pierrick Hymbert
2024-03-22	json-schema-to-grammar : fix order of props + non-str const/enum (#6232)	Olivier Chafik
2024-03-22	server : fix n_keep always showing as 0 in response (#6211)	Jan Boon
2024-03-22	server : enable continuous batching by default (#6231)	Georgi Gerganov
2024-03-22	metal : pad n_ctx by 32 (#6177)	Georgi Gerganov
2024-03-21	server : update readme doc from `slot_id` to `id_slot` (#6213)	Jan Boon
2024-03-21	json-schema-to-grammar improvements (+ added to server) (#5978)	Olivier Chafik
2024-03-21	Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)	Kawrakow
2024-03-20	llava : update MobileVLM-README.md (#6180)	Ziang Wu
2024-03-20	llava : add MobileVLM_V2 backup (#6175)	Ziang Wu
2024-03-20	Server: version bump for httplib and json (#6169)	Xuan Son Nguyen
2024-03-20	server : allow to override -ngl in tests (#6170)	Georgi Gerganov
2024-03-20	Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"	Georgi Gerganov
2024-03-20	llava : add a MobileVLM_V2-1.7B backup (#6152)	Ziang Wu
2024-03-20	Server: Handle n_keep parameter in the request (#6174)	Karthick
2024-03-20	server tests : more pythonic process management; fix bare `except:` (#6146)	Jared Van Bortel
2024-03-20	update readme sycl for new update (#6151)	Neo Zhang Jianyu
2024-03-19	Remove undeed header file. (#6158)	DAN™
2024-03-19	gguf-split: split and merge gguf per batch of tensors (#6135)	Pierrick Hymbert
2024-03-18	clip : fix memory leak (#6138)	Felix
2024-03-18	backend : offload large batches to GPU (#6083)	slaren
2024-03-17	common: llama_load_model_from_url using --model-url (#6098)	Pierrick Hymbert
2024-03-16	gritlm : add initial README.md (#6086)	Daniel Bevenius
2024-03-15	llava : change API to pure C style for Rust FFI bindgen (#6079)	Ting Lou
2024-03-15	fix set main gpu error (#6073)	Neo Zhang Jianyu
2024-03-15	llama-bench : use random tokens to improve accuracy with mixtral (#6069)	slaren
2024-03-14	gguf : fix resource leaks (#6061)	Steve Grubb
2024-03-14	embedding : add EOS token if not present (#899)	Georgi Gerganov
2024-03-14	readme : improve readme for Llava-1.6 example (#6044)	Jian Liao
2024-03-14	server: disable debug release type sanitizer, simplify trigger (#6047)	Pierrick Hymbert
2024-03-14	embedding : print all resulting embeddings (#899)	Georgi Gerganov
2024-03-14	embedding : print cosine similarity (#899)	Georgi Gerganov
2024-03-13	llama : add pipeline parallelism support (#6017)	slaren
2024-03-13	Server: Use multi-task for embeddings endpoint (#6001)	Xuan Son Nguyen