| Age | Commit message (Expand) | Author |
| 2025-06-11 | IQ2_XXS: much faster CPU prompt processing (#515) | Kawrakow |
| 2025-06-10 | Fix Compile error (C2668) (#508) | Gaolingx |
| 2025-06-09 | Docs update (#509) | saood06 |
| 2025-06-08 | Fix non rpc build error (#506) | firecoperana |
| 2025-06-08 | Revert "Rpc improvement (#480)" | Iwan Kawrakow |
| 2025-06-08 | Rpc improvement (#480) | firecoperana |
| 2025-06-08 | Update AUTHORS | Kawrakow |
| 2025-06-08 | Webui improvement (#481) | firecoperana |
| 2025-06-07 | Add an endpoint that lists all the saved prompt caches to server (#502) | saood06 |
| 2025-06-07 | Fix #499 (#501) | Kawrakow |
| 2025-06-06 | Make prompt cache saving and restoring MLA aware (#497) | saood06 |
| 2025-06-05 | IQ1_M_R4 CUDA implementation (#494) | Kawrakow |
| 2025-06-05 | MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4 (#493) | Kawrakow |
| 2025-06-05 | Faster CPU prompt processing for Trellis quants and MoE models (#488) | Kawrakow |
| 2025-06-05 | CUDA implementation for IQ1_S_R4 (#492) | Kawrakow |
| 2025-06-03 | Adding top-n-sigma sampler (#489) | Kawrakow |
| 2025-06-03 | Adding the XTC sampler (#486) | Kawrakow |
| 2025-06-03 | convert_hf_to_gguf.py : conversion from hf weights to Q6_0 (#483) | Nexes the Elder |
| 2025-06-01 | Minor (~2%) iq2_ks TG performance improvement on CUDA (#468) | Kawrakow |
| 2025-06-01 | Trellis quants: faster CPU prompt processing (#482) | Kawrakow |
| 2025-06-01 | Metal implementatio for the trellis quants. (#475) | Kawrakow |
| 2025-05-31 | forgotten refs and typo (#478) | Nexes the Elder |
| 2025-05-30 | Replace MLA-specific KV cache with the standard KV cache (#469) | Kawrakow |
| 2025-05-29 | NEON implementation for trellis quants (#471) | Kawrakow |
| 2025-05-28 | set cache_prompt default to true (#465) | saood06 |
| 2025-05-27 | CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 (#462) | Kawrakow |
| 2025-05-26 | CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461) | Kawrakow |
| 2025-05-25 | Add missing gguf-py constants (#458) | Kawrakow |
| 2025-05-24 | Legacy quants conversion schemes in convert_hf_to_gguf.py (#449) | Nexes the Elder |
| 2025-05-24 | Faster IQ3_KT and IQ4_KT (#453) | Kawrakow |
| 2025-05-23 | Fix bug in MMVQ kernel (#446) | Kawrakow |
| 2025-05-23 | Fix MSVC compilation (#448) | Kawrakow |
| 2025-05-23 | Fix typo in non-AVX2 code branch (#445) | Kawrakow |
| 2025-05-23 | Trellis quants with CPU inference (#441) | Andrew Chan |
| 2025-05-23 | gguf-split : update (#444) | Nexes the Elder |
| 2025-05-22 | Streamline a bit the quant strategies (#443) | Nexes the Elder |
| 2025-05-22 | Refactor iqk_mul_mat.cpp (#435) | Kawrakow |
| 2025-05-20 | Bug fixes from mainline (#439) | Kawrakow |
| 2025-05-18 | Forgotten MMQ ref and typo (#431) | Nexes the Elder |
| 2025-05-17 | Option to enable disable the IQK CPU FA kernels (#429) | Kawrakow |
| 2025-05-17 | Zen4: Faster PP for IQ2_KS, IQ4_KS, IQ5_KS (#428) | Kawrakow |
| 2025-05-17 | IQ5_KS_R4: row-interleaved IQ5_KS (#426) | Kawrakow |
| 2025-05-16 | Fix AVX2 implementation of IQ4_K, IQ4_KS, IQ5_K, IQ6_K (#427) | Kawrakow |
| 2025-05-15 | Adding forgotten template instance for iq5_ks (#424) | Kawrakow |
| 2025-05-15 | Adding IQ5_KS - 5.25 bpw quants (#422) | Kawrakow |
| 2025-05-15 | Fix standard attention on the CPU (#421) | Kawrakow |
| 2025-05-15 | CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K (#418) | Kawrakow |
| 2025-05-14 | CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K (#417) | Kawrakow |
| 2025-05-14 | Fix SER (CUDA) (#416) | Kawrakow |
| 2025-05-13 | Fix SER (CPU) (#415) | Kawrakow |