summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2024-12-18IQ4_KS_R4 (#150)Kawrakow
2024-12-18IQ5_K_R4 (#149)Kawrakow
2024-12-17Be able to repack tensors at run time (#147)Kawrakow
2024-12-17IQ2_K_R4 (#146)Kawrakow
2024-12-17IQ3_K_R4 (#145)Kawrakow
2024-12-15BF16_R16 - 16 interleaved bf16 rows (#142)Kawrakow
2024-12-14Q8_K_R8: Fastest quantized matrix multiplications (#141)Kawrakow
2024-12-12IQ4_K_R4 (#138)Kawrakow
2024-12-11Q2_K_R4 (#136)Kawrakow
2024-12-11Q3_K_R4 (#134)Kawrakow
2024-12-10Q5_K_R4 (#132)Kawrakow
2024-12-10Q6_K_R4 (#130)Kawrakow
2024-12-09Q4_K_R4 (#129)Kawrakow
2024-12-08Rename iq4_nl_x4 to iq4_nl_r4 (#126)Kawrakow
2024-12-08R4 improvements on ARM_NEON (#125)Kawrakow
2024-12-06iq2_bn_r4: fastest Bitnet CPU implementation on the planet (#124)Kawrakow
2024-12-04IQ4_XS_R4 (#123)Kawrakow
2024-12-03Q6_0_R4 (#122)Kawrakow
2024-12-03Q5_0_R4 (#121)Kawrakow
2024-12-03Q8_0_R4 (#120)Kawrakow
2024-12-02Q4_0_R4 (#119)Kawrakow
2024-12-02IQ4_NL_X4 (#118)Kawrakow
2024-11-21Use Q6_0 instead of Q5_1 for tensors incompatible with IQ5_K/Q5_K (#116)Nexes the Elder
2024-10-31Faster MoE inference (#112)Kawrakow
2024-10-26Use fused mul - unary op also for MoE models (#111)Kawrakow
2024-10-26Bitnet: use the fused mul-silu in the FFN network (#110)Kawrakow
2024-10-25Bitnet changes (#106)Kawrakow
2024-10-22Add support for Granite and GraniteMoE models (#102)Kawrakow
2024-10-20Avoid rebuild of GGML graph for each token (#98)agray3
2024-10-19Bitnet: make the scale tensors optional (#97)Kawrakow
2024-10-19Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S (#96)Nexes the Elder
2024-10-18CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)Nexes the Elder
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-11Minor: printf -> LLAMA_LOG_INFOIwan Kawrakow
2024-10-10Better model info (#84)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-02Fused unary(x)*y (#70)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-09-29Allow bf16 kv-cache (#69)Kawrakow
2024-09-28Time to fix replace_all (#68)Kawrakow
2024-09-28CUDA non-contiguous RoPE (#66)Kawrakow
2024-09-28Adding SWIGLU unary op (#65)Kawrakow
2024-09-28Better sub-3-bit quantization mixes with a qkv tensor (#64)Kawrakow
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-19MinorIwan Kawrakow
2024-09-14Quantization mixes tweaks (#53)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-09-08Adding fused rms_norm (#42)Kawrakow
2024-08-27Faster Gemma2 (#27)Kawrakow