Age | Commit message (Expand) | Author |
2025-06-11 | IQ2_XXS: much faster CPU prompt processing (#515) | Kawrakow |
2025-06-10 | Fix Compile error (C2668) (#508) | Gaolingx |
2025-06-08 | Fix non rpc build error (#506) | firecoperana |
2025-06-08 | Revert "Rpc improvement (#480)" | Iwan Kawrakow |
2025-06-08 | Rpc improvement (#480) | firecoperana |
2025-06-07 | Fix #499 (#501) | Kawrakow |
2025-06-05 | IQ1_M_R4 CUDA implementation (#494) | Kawrakow |
2025-06-05 | MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4 (#493) | Kawrakow |
2025-06-05 | Faster CPU prompt processing for Trellis quants and MoE models (#488) | Kawrakow |
2025-06-05 | CUDA implementation for IQ1_S_R4 (#492) | Kawrakow |
2025-06-01 | Minor (~2%) iq2_ks TG performance improvement on CUDA (#468) | Kawrakow |
2025-06-01 | Trellis quants: faster CPU prompt processing (#482) | Kawrakow |
2025-06-01 | Metal implementatio for the trellis quants. (#475) | Kawrakow |
2025-05-29 | NEON implementation for trellis quants (#471) | Kawrakow |
2025-05-27 | CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 (#462) | Kawrakow |
2025-05-26 | CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461) | Kawrakow |
2025-05-24 | Legacy quants conversion schemes in convert_hf_to_gguf.py (#449) | Nexes the Elder |
2025-05-24 | Faster IQ3_KT and IQ4_KT (#453) | Kawrakow |
2025-05-23 | Fix bug in MMVQ kernel (#446) | Kawrakow |
2025-05-23 | Fix MSVC compilation (#448) | Kawrakow |
2025-05-23 | Trellis quants with CPU inference (#441) | Andrew Chan |
2025-05-22 | Refactor iqk_mul_mat.cpp (#435) | Kawrakow |
2025-05-20 | Bug fixes from mainline (#439) | Kawrakow |
2025-05-18 | Forgotten MMQ ref and typo (#431) | Nexes the Elder |
2025-05-17 | Option to enable disable the IQK CPU FA kernels (#429) | Kawrakow |
2025-05-17 | Zen4: Faster PP for IQ2_KS, IQ4_KS, IQ5_KS (#428) | Kawrakow |
2025-05-17 | IQ5_KS_R4: row-interleaved IQ5_KS (#426) | Kawrakow |
2025-05-16 | Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K (#427) | Kawrakow |
2025-05-15 | Adding forgotten template instance for iq5_ks (#424) | Kawrakow |
2025-05-15 | Adding IQ5_KS - 5.25 bpw quants (#422) | Kawrakow |
2025-05-15 | Fix standard attention on the CPU (#421) | Kawrakow |
2025-05-15 | CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K (#418) | Kawrakow |
2025-05-14 | CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K (#417) | Kawrakow |
2025-05-14 | Fix SER (CUDA) (#416) | Kawrakow |
2025-05-13 | Fix SER (CPU) (#415) | Kawrakow |
2025-05-13 | Better CPU FA performance for DeepSeek-Lite (#410) | Kawrakow |
2025-05-12 | Fix new CUDA FA on Touring (#413) | Kawrakow |
2025-05-12 | Faster DeepSeek FA on CUDA (#408) | Kawrakow |
2025-05-12 | GPU offload policy (#405) | Kawrakow |
2025-05-11 | Revert "Fix race in the CUDA DeepSeek FA kernel (#406)" | Iwan Kawrakow |
2025-05-11 | Fix race in the CUDA DeepSeek FA kernel (#406) | Kawrakow |
2025-05-10 | TG improvements for MoE models (#404) | Kawrakow |
2025-05-09 | Fix CUDA FlashMLA-3 with quantized KV cache (#400) | Kawrakow |
2025-05-07 | FlashMLA-3 for DeepSeek models on CUDA (#386) | Kawrakow |
2025-05-07 | Fix DeepSeek q8_0 cache (#391) | Kawrakow |
2025-05-07 | Fix build for Xeon Gold 6226R (#390) | Kawrakow |
2025-05-05 | Fix DeepSeek FA (#382) | Kawrakow |
2025-05-04 | CUDA: MMQ for IQ4_KS (#374) | Kawrakow |
2025-05-04 | CUDA: faster FA TG for GQA models (#370) | Kawrakow |
2025-05-04 | Another attempt to fix #367 (#371) | Kawrakow |