summaryrefslogtreecommitdiff
path: root/ggml/src
AgeCommit message (Expand)Author
2024-10-19Attempt to blindly fix Windows build failure (#93)Kawrakow
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-16iq4_ks: faster dot product on Metal (#90)Kawrakow
2024-10-14Minor iq3_k tweakIwan Kawrakow
2024-10-14iq3_k: fix and optimize Metal dot product (#87)Kawrakow
2024-10-13Fix and optimize iq2k Metal implementation (#86)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-04Fix compiler warningsIwan Kawrakow
2024-10-04Move scale fudge factors to quantization (#81)Kawrakow
2024-10-04Move to c++17 projectwide (#80)Kawrakow
2024-10-04Do not quantize activations if not necessary (#79)Kawrakow
2024-10-02q6_0: Slightly faster Zen4/AVX2 (#78)Kawrakow
2024-10-02Fused unary(x)*y (#70)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-10-02iq4_nl: faster quantization (#76)Kawrakow
2024-10-01Fix Q5_0 flash attention (#75)Kawrakow
2024-10-01Fix last commitIwan Kawrakow
2024-10-01IQ4_NL kv-cache on the CPU (Zen4/AVX2/ARM_NEON) (#74)Kawrakow
2024-10-01CUDA: faster float -> iq4_nl conversion (#73)Kawrakow
2024-10-01iqk_mul_mat: better iq4_nl implementation on Zen4/AVX2 (#72)Kawrakow
2024-10-01iqk_mul_mat: better srategy when nrc_y not divisible by ny (#71)Kawrakow
2024-09-29Allow bf16 kv-cache (#69)Kawrakow
2024-09-28CUDA non-contiguous RoPE (#66)Kawrakow
2024-09-28Adding SWIGLU unary op (#65)Kawrakow
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-25Use fp32 for K*Q in Metal FA implementation (#62)Kawrakow
2024-09-17Fix compiler warnings (#58)Kawrakow
2024-09-17BF16 support on Metal (#56)Kawrakow
2024-09-16iqk_mul_mat(ARM_NEON): adding bf16 support (#41)Kawrakow
2024-09-15MinorIwan Kawrakow
2024-09-14Adding bf16 support to CUDA (#40)Kawrakow
2024-09-14Improve Q5_0 performance (#55)Kawrakow
2024-09-14Improve Q4_0 and Q8_0 performance on AVX2/Zen4 (#54)Kawrakow
2024-09-13MinorIwan Kawrakow
2024-09-13Fix bug and D < 128 case for Q8_0 k-cache (#52)Kawrakow
2024-09-12Quantized Flash Attention for all supported CPU platforms (#51)Kawrakow
2024-09-11AVX2 Flash Attention 2 (#50)Kawrakow
2024-09-11ARM_NEON Flash Attention (#49)Kawrakow
2024-09-10AVX2 Flash Attention (#48)Kawrakow
2024-09-10iq2_tn: slightly better performance on AVX2 (#47)Kawrakow
2024-09-10IQ1_TN Metal implementation (#46)Kawrakow
2024-09-09Add CUDA support for IQ1_TN (#45)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-09-08iq2_tn: slightly faster PP (#43)Kawrakow
2024-09-08Adding fused rms_norm (#42)Kawrakow
2024-09-05Add support for bf16 to iqk_mul_mat (#39)Kawrakow
2024-09-05Zen4 Flash Attention - bf16 support (#38)Kawrakow
2024-09-04Performance improvements for legacy quants on ARM_NEON (#37)Kawrakow
2024-09-04Zen4 Flash Attnetion 2 (#36)Kawrakow