summaryrefslogtreecommitdiff
path: root/ggml/src
AgeCommit message (Expand)Author
2025-02-15Bug fix in activation quantizationIwan Kawrakow
2025-02-15Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp (#207)Kawrakow
2025-02-12Fix iqk_mul_mat on AVX512 systems that are missing BF16 support (#204)Kawrakow
2025-02-11DeepSeek FA support (CPU only) (#200)Kawrakow
2025-02-09Add optional MLA (#188)Kawrakow
2025-02-09FA: Add option to build all FA kernels (#197)Kawrakow
2025-02-09Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications (#194)Kawrakow
2025-02-08Revert #79 (#192)Kawrakow
2025-02-07cuda: non-contiguous rms norm (#190)Kawrakow
2025-02-07Add additional checks for iq1_s_r4 quantization (#191)Kawrakow
2025-02-06Rename q4_0_r4, q8_0_r4 and iq4_xs_r4 to _r8 (#189)Kawrakow
2025-02-06IQ1_M_R4: better 1.75 bpw quants (#187)Kawrakow
2025-02-05iq1_s_r4: slightly faster NEON gemm/gemv (#186)Kawrakow
2025-02-05IQ1_S_R4: better 1.5 bpw quants (#185)Kawrakow
2025-01-30Deepseek-Lite (#184)Kawrakow
2025-01-30Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182)Kawrakow
2025-01-29Various (#181)Kawrakow
2025-01-27Minor performance improvements (#179)Kawrakow
2025-01-27Interleave 8 rows (Q8_0, IQ4_XS) (#178)Kawrakow
2025-01-22Better BF16 support on AVX2 (#175)Kawrakow
2025-01-21On Zen4 repack fp16 models to bf16_r16 when run-time-repacking is requested (...Kawrakow
2025-01-20More Flash Attention improvements (#173)Kawrakow
2025-01-15CPU Flash Attention improvements (#172)Kawrakow
2025-01-12Fix the strange FA behavior with odd/even batch sizes (#171)Kawrakow
2025-01-12MoE fix for R4 quants (#170)Kawrakow
2025-01-10Be able to re-quantize MS BitNet I2_S models (#169)Kawrakow
2025-01-10Falcon3 changes (#168)Kawrakow
2024-12-23iq4_0_r4: Use AVX2 version for matrix x vector (#163)Kawrakow
2024-12-23IQ3_S_R4 (#162)Kawrakow
2024-12-23MSVC fixes (#161)Kawrakow
2024-12-22Faster R4 legacy quants (#158)Kawrakow
2024-12-22R4 i-quants improvements (#157)Kawrakow
2024-12-21IQ2_S_R4 (#156)Kawrakow
2024-12-21IQ2_XS_R4 (#155)Kawrakow
2024-12-20IQ2_XXS_R4 (#154)Kawrakow
2024-12-20fix typo (#151)Nexes the Elder
2024-12-20IQ3_XXS_R4 (#153)Kawrakow
2024-12-18IQ4_KS_R4 (#150)Kawrakow
2024-12-18IQ5_K_R4 (#149)Kawrakow
2024-12-17Slightly better matrix x vector on Zen4/AVX2 for iq2_k_r4, iq3_k_r4, iq4_k_r4...Kawrakow
2024-12-17Be able to repack tensors at run time (#147)Kawrakow
2024-12-17IQ2_K_R4 (#146)Kawrakow
2024-12-17IQ3_K_R4 (#145)Kawrakow
2024-12-16Slightly faster IQ4_K_R4 on AVX2/Zen4 (#144)Kawrakow
2024-12-16Slightly faster IQ4_XS_R4 on AVX2 (#143)Kawrakow
2024-12-16q8_k_r8: this change for NEON got lost?Iwan Kawrakow
2024-12-15BF16_R16 - 16 interleaved bf16 rows (#142)Kawrakow
2024-12-14Q8_K_R8: Fastest quantized matrix multiplications (#141)Kawrakow
2024-12-13Faster R4 quants on Zen4 (#139)Kawrakow
2024-12-13Another fixIwan Kawrakow