summaryrefslogtreecommitdiff
path: root/ggml/src
AgeCommit message (Expand)Author
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-25Use fp32 for K*Q in Metal FA implementation (#62)Kawrakow
2024-09-17Fix compiler warnings (#58)Kawrakow
2024-09-17BF16 support on Metal (#56)Kawrakow
2024-09-16iqk_mul_mat(ARM_NEON): adding bf16 support (#41)Kawrakow
2024-09-15MinorIwan Kawrakow
2024-09-14Adding bf16 support to CUDA (#40)Kawrakow
2024-09-14Improve Q5_0 performance (#55)Kawrakow
2024-09-14Improve Q4_0 and Q8_0 performance on AVX2/Zen4 (#54)Kawrakow
2024-09-13MinorIwan Kawrakow
2024-09-13Fix bug and D < 128 case for Q8_0 k-cache (#52)Kawrakow
2024-09-12Quantized Flash Attention for all supported CPU platforms (#51)Kawrakow
2024-09-11AVX2 Flash Attention 2 (#50)Kawrakow
2024-09-11ARM_NEON Flash Attention (#49)Kawrakow
2024-09-10AVX2 Flash Attention (#48)Kawrakow
2024-09-10iq2_tn: slightly better performance on AVX2 (#47)Kawrakow
2024-09-10IQ1_TN Metal implementation (#46)Kawrakow
2024-09-09Add CUDA support for IQ1_TN (#45)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-09-08iq2_tn: slightly faster PP (#43)Kawrakow
2024-09-08Adding fused rms_norm (#42)Kawrakow
2024-09-05Add support for bf16 to iqk_mul_mat (#39)Kawrakow
2024-09-05Zen4 Flash Attention - bf16 support (#38)Kawrakow
2024-09-04Performance improvements for legacy quants on ARM_NEON (#37)Kawrakow
2024-09-04Zen4 Flash Attnetion 2 (#36)Kawrakow
2024-09-02Fix Zen4 Flash Attention (#35)Kawrakow
2024-09-01Zen4 Flash Attention (#32)Kawrakow
2024-08-31Fix build when iqk_mul_mat is disabled (#31)Kawrakow
2024-08-27Faster Gemma2 (#27)Kawrakow
2024-08-21softcap: minor improvement (#24)Kawrakow
2024-08-20Fused soft cap and SIMD-ified GeLU (#9)Kawrakow
2024-08-19AVX2 quantization for Q8_K (#22)Kawrakow
2024-08-14Skip barriers of noops (#19)Kawrakow
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-08-09Fix MakefileIwan Kawrakow
2024-08-09Fix Zen4 implementation of iq3_k, iq4_k, iq5_kIwan Kawrakow
2024-08-09iq6_k: AVX2Iwan Kawrakow
2024-08-09iq6_k: MetalIwan Kawrakow
2024-08-09iq6_k: NEONIwan Kawrakow
2024-08-09iq6_k: slightly better Zen4 iqk_mul_matIwan Kawrakow
2024-08-09iq6_k: Zen4 iqk_mul_matIwan Kawrakow
2024-08-09iq6_k: CUDA dot productIwan Kawrakow
2024-08-09iq6_k: CUDA dequantizeIwan Kawrakow
2024-08-09iq6_k: WIP (quantize/dequantize)Iwan Kawrakow
2024-08-09iq6_k: WIP (nothing works)Iwan Kawrakow
2024-08-07Adding IQ2_TN for use with ternary models (#13)Kawrakow
2024-08-05q2_K: allow it to detect ternary nets and quantize accordinglyIwan Kawrakow
2024-08-05iq3_k, iq5_k: faster quantizationIwan Kawrakow
2024-08-03iq4_k: speedup quantization by a factor of ~2Iwan Kawrakow
2024-08-01Add copyright noticeIwan Kawrakow