ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-12-03	Q8_0_R4 (#120)	Kawrakow
2024-12-02	Q4_0_R4 (#119)	Kawrakow
2024-12-02	IQ4_NL_X4 (#118)	Kawrakow
2024-10-25	Bitnet changes (#106)	Kawrakow
2024-10-18	CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)	Nexes the Elder
2024-10-16	Adding IQ4_KSS: 4.0 bpw quants (#89)	Kawrakow
2024-10-13	IQ2_KS: 2.1875 bpw non-linear quantization (#85)	Kawrakow
2024-10-10	Better model info (#84)	Kawrakow
2024-10-09	New SOTA quantization: 4.25 bpw IQ4_KS (#83)	Kawrakow
2024-10-02	Adding Q6_0 (#77)	Kawrakow
2024-09-27	Adding ability to have meta data per tensor row (#61)	Kawrakow
2024-09-09	Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)	Kawrakow
2024-09-05	Zen4 Flash Attention - bf16 support (#38)	Kawrakow
2024-08-20	Fused soft cap and SIMD-ified GeLU (#9)	Kawrakow
2024-08-19	quantize_stats: print rmse and max error as fraction of <x> (#21)	Kawrakow
2024-08-12	Merge mainline - Aug 12 2024 (#17)	Kawrakow
2024-08-09	iq6_k: WIP (quantize/dequantize)	Iwan Kawrakow
2024-08-07	Adding IQ2_TN for use with ternary models (#13)	Kawrakow
2024-08-05	q2_K: allow it to detect ternary nets and quantize accordingly	Iwan Kawrakow
2024-08-01	iq3_k: Basics	Iwan Kawrakow
2024-08-01	iq5_k: Basics	Iwan Kawrakow
2024-08-01	iq2_k: Basics	Iwan Kawrakow
2024-07-28	IQ4_K: SOTA 4-bit quantization (#6)	Kawrakow
2024-07-27	Merge mainline llama.cpp (#3)	Kawrakow
2024-07-24	Add copyright notices	Iwan Kawrakow
2024-06-26	imatrix: be able to specify the name of the output tensor	Iwan Kawrakow
2024-06-24	Bitnet: tiny bity faster 1.625 bpw variant on Metal	Iwan Kawrakow
2024-06-22	bitnet: add 2 bpw quantization	Iwan Kawrakow
2024-06-22	bitnet: CUDA, scalar, AVX2	Iwan Kawrakow
2024-06-21	llama : allow pooled embeddings on any model (#7477)	Douglas Hanley
2024-06-21	swiftui : enable stream updating (#7754)	Shuichi Tsutsumi
2024-06-20	[SYCL] Fix windows build and inference (#8003)	luoyu-intel
2024-06-20	server : fix smart slot selection (#8020)	sasha0552
2024-06-18	Only use FIM middle token if it exists (#7648)	Sigbjørn Skjæret
2024-06-17	Add support for sqrt on CUDA (#7953)	Calvin Laurenson
2024-06-15	Add `cvector-generator` example (#7514)	Xuan Son Nguyen
2024-06-14	llama-bench : fix RPC indication (#7936)	Radoslav Gerganov
2024-06-13	move BLAS to a separate backend (#6210)	slaren
2024-06-13	`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...	Olivier Chafik
2024-06-12	server : restore numeric prompts (#7883)	Georgi Gerganov
2024-06-11	llama-bench: more compact markdown tables (#7879)	Johannes Gäßler
2024-06-11	json: refine constraint for whitespace to avoid runaways yet allow pretty pri...	Olivier Chafik
2024-06-11	`json`: document schema conversion in GBNF readme, align manual grammar examp...	Olivier Chafik
2024-06-10	examples : remove --instruct remnants (#7846)	Georgi Gerganov
2024-06-10	server : improve "prompt" handling (#7847)	Georgi Gerganov
2024-06-09	imatrix : handle partial entries (#7833)	Georgi Gerganov
2024-06-09	server: do not remove whitespace at the start of a completion chunk (#7830)	mgroeber9110
2024-06-09	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)	slaren
2024-06-08	server : smart slot selection using Longest Common Prefix (#7728)	sasha0552
2024-06-07	gguf-split : change binary multi-byte units to decimal (#7803)	Christian Zhou-Zheng