ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-12-03	Q8_0_R4 (#120)	Kawrakow
2024-12-02	Q4_0_R4 (#119)	Kawrakow
2024-12-02	IQ4_NL_X4 (#118)	Kawrakow
2024-11-21	Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K/Q5_K (#116)	Nexes the Elder
2024-11-21	MMQ for Q6_0 (#115)	Kawrakow
2024-10-31	Faster MoE inference (#112)	Kawrakow
2024-10-26	Use fused mul - unary op also for MoE models (#111)	Kawrakow
2024-10-26	Bitnet: use the fused mul-silu in the FFN network (#110)	Kawrakow
2024-10-26	Bitnet CUDA improvements (#109)	Kawrakow
2024-10-26	Improve Bitnet PP on Metal (#108)	Kawrakow
2024-10-26	Faster IQ1_BN Metal implementation (#107)	Kawrakow
2024-10-25	Remove forgotten IQ1_TN, IQ2_TN enum values	Iwan Kawrakow
2024-10-25	Bitnet changes (#106)	Kawrakow
2024-10-24	Fix quantized k-cache without FA (#105)	Kawrakow
2024-10-22	Add support for Granite and GraniteMoE models (#102)	Kawrakow
2024-10-22	Enable q6_0 for flash attention (#101)	Kawrakow
2024-10-21	Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99)	Kawrakow
2024-10-20	Avoid rebuild of GGML graph for each token (#98)	agray3
2024-10-19	Bitnet: make the scale tensors optional (#97)	Kawrakow
2024-10-19	Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S (#96)	Nexes the Elder
2024-10-19	Attempt to blindly fix Windows build failure (#93)	Kawrakow
2024-10-18	CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)	Nexes the Elder
2024-10-16	Adding IQ4_KSS: 4.0 bpw quants (#89)	Kawrakow
2024-10-16	iq4_ks: faster dot product on Metal (#90)	Kawrakow
2024-10-14	Minor iq3_k tweak	Iwan Kawrakow
2024-10-14	iq3_k: fix and optimize Metal dot product (#87)	Kawrakow
2024-10-13	Fix and optimize iq2k Metal implementation (#86)	Kawrakow
2024-10-13	IQ2_KS: 2.1875 bpw non-linear quantization (#85)	Kawrakow
2024-10-11	Minor: printf -> LLAMA_LOG_INFO	Iwan Kawrakow
2024-10-10	Better model info (#84)	Kawrakow
2024-10-09	New SOTA quantization: 4.25 bpw IQ4_KS (#83)	Kawrakow
2024-10-04	Fix compiler warnings	Iwan Kawrakow
2024-10-04	Move scale fudge factors to quantization (#81)	Kawrakow
2024-10-04	Move to c++17 projectwide (#80)	Kawrakow
2024-10-04	Do not quantize activations if not necessary (#79)	Kawrakow
2024-10-02	q6_0: Slightly faster Zen4/AVX2 (#78)	Kawrakow
2024-10-02	Fused unary(x)*y (#70)	Kawrakow
2024-10-02	Adding Q6_0 (#77)	Kawrakow
2024-10-02	iq4_nl: faster quantization (#76)	Kawrakow
2024-10-01	Fix Q5_0 flash attention (#75)	Kawrakow
2024-10-01	Fix last commit	Iwan Kawrakow
2024-10-01	IQ4_NL kv-cache on the CPU (Zen4/AVX2/ARM_NEON) (#74)	Kawrakow
2024-10-01	CUDA: faster float -> iq4_nl conversion (#73)	Kawrakow
2024-10-01	iqk_mul_mat: better iq4_nl implementation on Zen4/AVX2 (#72)	Kawrakow
2024-10-01	iqk_mul_mat: better srategy when nrc_y not divisible by ny (#71)	Kawrakow
2024-09-29	Allow bf16 kv-cache (#69)	Kawrakow
2024-09-28	Time to fix replace_all (#68)	Kawrakow
2024-09-28	CUDA non-contiguous RoPE (#66)	Kawrakow
2024-09-28	Adding SWIGLU unary op (#65)	Kawrakow
2024-09-28	Better sub-3-bit quantization mixes with a qkv tensor (#64)	Kawrakow