ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-03-26	quantize : be able to override metadata by key (#6321)	Kawrakow
2024-03-26	embedding : adjust `n_ubatch` value (#6296)	Minsoo Cheong
2024-03-26	server : add `n_discard` parameter (#6300)	Jan Boon
2024-03-25	nix: make `xcrun` visible in Nix sandbox for precompiling Metal shaders (#6118)	Joseph Stahl
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren
2024-03-25	nix: fix blas support (#6281)	Christian Kögler
2024-03-25	tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)	Kawrakow
2024-03-25	flake.lock: Update (#6266)	Georgi Gerganov
2024-03-25	cuda : fix LLAMA_CUDA_F16 build (#6298)	slaren
2024-03-25	cuda : refactor into multiple files (#6269)	slaren
2024-03-25	Server: clean up OAI params parsing function (#6284)	Xuan Son Nguyen
2024-03-25	[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)	Neo Zhang Jianyu
2024-03-25	examples : add "retrieval" (#6193)	Minsoo Cheong
2024-03-25	ggml : support AVX512VNNI (#6280)	Justine Tunney
2024-03-24	Fix heap corruption from wmode out-of-bound writes on windows (#6272)	Rick G
2024-03-24	imatrix : fix wname for mul_mat_id ops (#6271)	Georgi Gerganov
2024-03-24	Fixed lookup compilation issues on Windows (#6273)	Johannes Gäßler
2024-03-24	ci : close inactive issue, increase operations per run (#6270)	Pierrick Hymbert
2024-03-24	sampling : deduplicated code for probability distribution access (#6240)	Minsoo Cheong
2024-03-24	[SYCL] offload op (#6217)	Meng, Hengyu
2024-03-24	Support build win release for SYCL (#6241)	Neo Zhang Jianyu
2024-03-23	use _wfopen instead of fopen on Windows (#6248)	Jared Van Bortel
2024-03-23	gitignore : gguf-split	Georgi Gerganov
2024-03-23	common: llama_load_model_from_url split support (#6192)	Pierrick Hymbert
2024-03-23	server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (...	Pierrick Hymbert
2024-03-23	llama : add grok-1 support (#6204)	Julius Arkenberg
2024-03-23	split: add gguf-split in the make build target (#6262)	Pierrick Hymbert
2024-03-23	server: flush stdout after logging in both text and json layout (#6253)	Pierrick Hymbert
2024-03-23	lookup: complement data from context with general text statistics (#5479)	Johannes Gäßler
2024-03-22	common : default --hf-file to --model (#6234)	Georgi Gerganov
2024-03-22	convert-llama2c-to-ggml : enable conversion of GQA models (#6237)	fraxy-v
2024-03-22	quantize: options for output and token embedding tensors qtype (#6239)	Kawrakow
2024-03-22	llama_model_loader: support multiple split/shard GGUFs (#6187)	Pierrick Hymbert
2024-03-22	ci: apply concurrency limit for github workflows (#6243)	Minsoo Cheong
2024-03-22	common : add HF arg helpers (#6234)	Georgi Gerganov
2024-03-22	llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)	Nexesenex
2024-03-22	tests : conditional python & node json schema tests (#6207)	Olivier Chafik
2024-03-22	json-schema-to-grammar : fix order of props + non-str const/enum (#6232)	Olivier Chafik
2024-03-22	cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)	slaren
2024-03-22	readme : add RecurseChat to the list of UIs (#6219)	Xiaoyi Chen
2024-03-22	server : fix n_keep always showing as 0 in response (#6211)	Jan Boon
2024-03-22	server : enable continuous batching by default (#6231)	Georgi Gerganov
2024-03-22	metal : proper assert for mat-mat memory alignment (#6225)	Georgi Gerganov
2024-03-22	ci : add CURL flag for the mac builds (#6214)	Vaibhav Srivastav
2024-03-22	metal : pad n_ctx by 32 (#6177)	Georgi Gerganov
2024-03-22	add blog link (#6222)	Neo Zhang Jianyu
2024-03-22	Fix params underscore convert to dash. (#6203)	DAN™
2024-03-21	server : update readme doc from `slot_id` to `id_slot` (#6213)	Jan Boon
2024-03-21	cuda : disable host register by default (#6206)	slaren
2024-03-21	Corrected typo to wrong file (#6199)	semidark