summaryrefslogtreecommitdiff
path: root/ggml/src
AgeCommit message (Expand)Author
2024-12-13Adding lost q4_k_r4 caseIwan Kawrakow
2024-12-12IQ4_K_R4 (#138)Kawrakow
2024-12-11Fix AVX2 implementation of iq4_nl_r4 (#137)Kawrakow
2024-12-11Q2_K_R4 (#136)Kawrakow
2024-12-11Better ARM_NEON implementation for R4 quants (#135)Kawrakow
2024-12-11Q3_K_R4 (#134)Kawrakow
2024-12-10Q5_K_R4 (#132)Kawrakow
2024-12-10Slightly faster Q4_K_R4 and IQ4_XS_R4 on Zen4 (#131)Kawrakow
2024-12-10Q6_K_R4 (#130)Kawrakow
2024-12-09Q4_K_R4 (#129)Kawrakow
2024-12-08Faster IQ4_XS_R4 on Zen4 (#128)Kawrakow
2024-12-08Rename iq4_nl_x4 to iq4_nl_r4 (#126)Kawrakow
2024-12-08R4 improvements on ARM_NEON (#125)Kawrakow
2024-12-06iq2_bn_r4: fastest Bitnet CPU implementation on the planet (#124)Kawrakow
2024-12-04IQ4_XS_R4 (#123)Kawrakow
2024-12-03Q6_0_R4 (#122)Kawrakow
2024-12-03Q5_0_R4 (#121)Kawrakow
2024-12-03Q8_0_R4 (#120)Kawrakow
2024-12-02Q4_0_R4 (#119)Kawrakow
2024-12-02IQ4_NL_X4 (#118)Kawrakow
2024-11-21MMQ for Q6_0 (#115)Kawrakow
2024-10-31Faster MoE inference (#112)Kawrakow
2024-10-26Bitnet CUDA improvements (#109)Kawrakow
2024-10-26Improve Bitnet PP on Metal (#108)Kawrakow
2024-10-26Faster IQ1_BN Metal implementation (#107)Kawrakow
2024-10-25Bitnet changes (#106)Kawrakow
2024-10-24Fix quantized k-cache without FA (#105)Kawrakow
2024-10-22Enable q6_0 for flash attention (#101)Kawrakow
2024-10-21Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99)Kawrakow
2024-10-20Avoid rebuild of GGML graph for each token (#98)agray3
2024-10-19Attempt to blindly fix Windows build failure (#93)Kawrakow
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-16iq4_ks: faster dot product on Metal (#90)Kawrakow
2024-10-14Minor iq3_k tweakIwan Kawrakow
2024-10-14iq3_k: fix and optimize Metal dot product (#87)Kawrakow
2024-10-13Fix and optimize iq2k Metal implementation (#86)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-04Fix compiler warningsIwan Kawrakow
2024-10-04Move scale fudge factors to quantization (#81)Kawrakow
2024-10-04Move to c++17 projectwide (#80)Kawrakow
2024-10-04Do not quantize activations if not necessary (#79)Kawrakow
2024-10-02q6_0: Slightly faster Zen4/AVX2 (#78)Kawrakow
2024-10-02Fused unary(x)*y (#70)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-10-02iq4_nl: faster quantization (#76)Kawrakow
2024-10-01Fix Q5_0 flash attention (#75)Kawrakow
2024-10-01Fix last commitIwan Kawrakow
2024-10-01IQ4_NL kv-cache on the CPU (Zen4/AVX2/ARM_NEON) (#74)Kawrakow
2024-10-01CUDA: faster float -> iq4_nl conversion (#73)Kawrakow