ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2025-02-07	cuda: non-contiguous rms norm (#190)	Kawrakow
2024-11-21	MMQ for Q6_0 (#115)	Kawrakow
2024-10-31	Faster MoE inference (#112)	Kawrakow
2024-10-26	Bitnet CUDA improvements (#109)	Kawrakow
2024-10-25	Bitnet changes (#106)	Kawrakow
2024-10-24	Fix quantized k-cache without FA (#105)	Kawrakow
2024-10-22	Enable q6_0 for flash attention (#101)	Kawrakow
2024-10-21	Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99)	Kawrakow
2024-10-16	Adding IQ4_KSS: 4.0 bpw quants (#89)	Kawrakow
2024-10-13	IQ2_KS: 2.1875 bpw non-linear quantization (#85)	Kawrakow
2024-10-09	New SOTA quantization: 4.25 bpw IQ4_KS (#83)	Kawrakow
2024-10-04	Move scale fudge factors to quantization (#81)	Kawrakow
2024-10-02	Fused unary(x)*y (#70)	Kawrakow
2024-10-02	Adding Q6_0 (#77)	Kawrakow
2024-10-01	CUDA: faster float -> iq4_nl conversion (#73)	Kawrakow
2024-09-29	Allow bf16 kv-cache (#69)	Kawrakow
2024-09-28	CUDA non-contiguous RoPE (#66)	Kawrakow
2024-09-28	Adding SWIGLU unary op (#65)	Kawrakow
2024-09-27	Adding ability to have meta data per tensor row (#61)	Kawrakow
2024-09-14	Adding bf16 support to CUDA (#40)	Kawrakow
2024-09-09	Add CUDA support for IQ1_TN (#45)	Kawrakow
2024-09-08	Adding fused rms_norm (#42)	Kawrakow
2024-08-27	Faster Gemma2 (#27)	Kawrakow
2024-08-20	Fused soft cap and SIMD-ified GeLU (#9)	Kawrakow
2024-08-12	Merge mainline - Aug 12 2024 (#17)	Kawrakow
2024-08-09	Fix Zen4 implementation of iq3_k, iq4_k, iq5_k	Iwan Kawrakow
2024-08-09	iq6_k: CUDA dot product	Iwan Kawrakow
2024-08-09	iq6_k: CUDA dequantize	Iwan Kawrakow
2024-08-09	iq6_k: WIP (nothing works)	Iwan Kawrakow
2024-08-07	Adding IQ2_TN for use with ternary models (#13)	Kawrakow
2024-08-01	Add copyright notice	Iwan Kawrakow
2024-08-01	iq3_k: faster CUDA dot product	Iwan Kawrakow
2024-08-01	iq3_k: CUDA dot product	Iwan Kawrakow
2024-08-01	iq3_k: Basics	Iwan Kawrakow
2024-08-01	iq2_k: very slightly better CUDA dot product	Iwan Kawrakow
2024-08-01	iq2_k: better CUDA dot product	Iwan Kawrakow
2024-08-01	iq2_k: CUDA dot product finally works	Iwan Kawrakow
2024-08-01	iq5_k: CUDA dot product finally works	Iwan Kawrakow
2024-08-01	Factor out iqk CUDA dot products	Iwan Kawrakow
2024-08-01	iq5_k: CUDA dot product still not working	Iwan Kawrakow
2024-08-01	iq5_k: Basics	Iwan Kawrakow
2024-08-01	iq2_k: Basics	Iwan Kawrakow
2024-07-28	IQ4_K: SOTA 4-bit quantization (#6)	Kawrakow
2024-07-27	Merge mainline llama.cpp (#3)	Kawrakow