ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-12-18	IQ4_KS_R4 (#150)	Kawrakow
2024-12-18	IQ5_K_R4 (#149)	Kawrakow
2024-12-17	Be able to repack tensors at run time (#147)	Kawrakow
2024-12-17	IQ2_K_R4 (#146)	Kawrakow
2024-12-17	IQ3_K_R4 (#145)	Kawrakow
2024-12-15	BF16_R16 - 16 interleaved bf16 rows (#142)	Kawrakow
2024-12-14	Q8_K_R8: Fastest quantized matrix multiplications (#141)	Kawrakow
2024-12-12	IQ4_K_R4 (#138)	Kawrakow
2024-12-11	Q2_K_R4 (#136)	Kawrakow
2024-12-11	Q3_K_R4 (#134)	Kawrakow
2024-12-10	Q5_K_R4 (#132)	Kawrakow
2024-12-10	Q6_K_R4 (#130)	Kawrakow
2024-12-09	Q4_K_R4 (#129)	Kawrakow
2024-12-08	Rename iq4_nl_x4 to iq4_nl_r4 (#126)	Kawrakow
2024-12-08	R4 improvements on ARM_NEON (#125)	Kawrakow
2024-12-06	iq2_bn_r4: fastest Bitnet CPU implementation on the planet (#124)	Kawrakow
2024-12-04	IQ4_XS_R4 (#123)	Kawrakow
2024-12-03	Q6_0_R4 (#122)	Kawrakow
2024-12-03	Q5_0_R4 (#121)	Kawrakow
2024-12-03	Q8_0_R4 (#120)	Kawrakow
2024-12-02	Q4_0_R4 (#119)	Kawrakow
2024-12-02	IQ4_NL_X4 (#118)	Kawrakow
2024-11-21	Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K/Q5_K (#116)	Nexes the Elder
2024-10-31	Faster MoE inference (#112)	Kawrakow
2024-10-26	Use fused mul - unary op also for MoE models (#111)	Kawrakow
2024-10-26	Bitnet: use the fused mul-silu in the FFN network (#110)	Kawrakow
2024-10-25	Bitnet changes (#106)	Kawrakow
2024-10-22	Add support for Granite and GraniteMoE models (#102)	Kawrakow
2024-10-20	Avoid rebuild of GGML graph for each token (#98)	agray3
2024-10-19	Bitnet: make the scale tensors optional (#97)	Kawrakow
2024-10-19	Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S (#96)	Nexes the Elder
2024-10-18	CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)	Nexes the Elder
2024-10-16	Adding IQ4_KSS: 4.0 bpw quants (#89)	Kawrakow
2024-10-13	IQ2_KS: 2.1875 bpw non-linear quantization (#85)	Kawrakow
2024-10-11	Minor: printf -> LLAMA_LOG_INFO	Iwan Kawrakow
2024-10-10	Better model info (#84)	Kawrakow
2024-10-09	New SOTA quantization: 4.25 bpw IQ4_KS (#83)	Kawrakow
2024-10-02	Fused unary(x)*y (#70)	Kawrakow
2024-10-02	Adding Q6_0 (#77)	Kawrakow
2024-09-29	Allow bf16 kv-cache (#69)	Kawrakow
2024-09-28	Time to fix replace_all (#68)	Kawrakow
2024-09-28	CUDA non-contiguous RoPE (#66)	Kawrakow
2024-09-28	Adding SWIGLU unary op (#65)	Kawrakow
2024-09-28	Better sub-3-bit quantization mixes with a qkv tensor (#64)	Kawrakow
2024-09-27	Adding ability to have meta data per tensor row (#61)	Kawrakow
2024-09-19	Minor	Iwan Kawrakow
2024-09-14	Quantization mixes tweaks (#53)	Kawrakow
2024-09-09	Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)	Kawrakow
2024-09-08	Adding fused rms_norm (#42)	Kawrakow
2024-08-27	Faster Gemma2 (#27)	Kawrakow