summaryrefslogtreecommitdiff
path: root/src/llama.cpp
AgeCommit message (Expand)Author
2025-07-17Bump Windows max open files from 512 to 2048 (#620)Thireus ☠
2025-07-15kimi-k2 convert script and chat template (#612)ubergarm
2025-07-15Vulkan: a fresh start (#608)Kawrakow
2025-07-14Adding IQ2_KL (#602)Kawrakow
2025-07-14Ported kimi-k2 support from llama.cpp (#609)Aleksey Nikiforov
2025-07-13Fix attn_v conditionality (#604)Nexes the Elder
2025-07-10Support for dots.llm1 models (#573)saood06
2025-07-09add hunyuan moe support for 561 (#565)ubergarm
2025-07-06Special handling of Seed Coder FIM tokens (#585)Fizz~
2025-07-06Fix server crash when there is no DRY sampler (#588)firecoperana
2025-07-04Vulkan: adding GGML_OP_MULTI_ADD implementation (#582)Kawrakow
2025-07-03Vulkan: Disable multi-add for now (#581)Kawrakow
2025-07-03Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580)Kawrakow
2025-07-03Vulkan: fused rms norm (#577)Kawrakow
2025-07-03Do not crash when there is no DRY sampler (#578)Kawrakow
2025-07-02Adding IQ3_KS quants (#566)Kawrakow
2025-07-02Conditionally disable fused ops when building with Vulkan enabled (#569)Kawrakow
2025-07-02Merge vulkan code from mainline up to commit of 6/28/2025 (#563)firecoperana
2025-06-26Add Falcon-Edge support (#555)Kawrakow
2025-06-21Faster ARM_NEON GEMM implementation for legacy quants (#546)Kawrakow
2025-06-19add dry sampler (#513)firecoperana
2025-06-18New IQ2_KT, IQ3_KT and IQ4_KT, V2 (#529)Kawrakow
2025-06-06Make prompt cache saving and restoring MLA aware (#497)saood06
2025-06-03Adding top-n-sigma sampler (#489)Kawrakow
2025-06-03Adding the XTC sampler (#486)Kawrakow
2025-05-31forgotten refs and typo (#478)Nexes the Elder
2025-05-30Replace MLA-specific KV cache with the standard KV cache (#469)Kawrakow
2025-05-24Legacy quants conversion schemes in convert_hf_to_gguf.py (#449)Nexes the Elder
2025-05-23Trellis quants with CPU inference (#441)Andrew Chan
2025-05-22Streamline a bit the quant strategies (#443)Nexes the Elder
2025-05-17IQ5_KS_R4: row-interleaved IQ5_KS (#426)Kawrakow
2025-05-15Adding IQ5_KS - 5.25 bpw quants (#422)Kawrakow
2025-05-12Enable faster prompt processing with mainline llama.cpp GGUFs (#409)Kawrakow
2025-05-12Faster DeepSeek FA on CUDA (#408)Kawrakow
2025-05-12GPU offload policy (#405)Kawrakow
2025-05-09Handle incompatible DeepSeek GGUFs (#394)Kawrakow
2025-05-09Support for Llama-3-Nemotron models (#377)saood06
2025-05-02Fix model architecture name (#366)saood06
2025-04-29Apply Qwen3 PR from llama.cpp (#355)Ben Harris
2025-04-26Add GLM-4-0414 Model Support (#344)ubergarm
2025-04-26Add support for Cohere2 (#341)Kawrakow
2025-04-25Fix LLaMA-4 attention (#342)Kawrakow
2025-04-22BitNet adjustments (#338)Kawrakow
2025-04-22Add support for bitnet2b_2501 model (#337)saood06
2025-04-11Correct L4 rms_norm (#324)Kawrakow
2025-04-10LlaMA-4 support (text only) (#321)Kawrakow
2025-04-08Guard against attempts to use MLA for non-MLA models (#320)Kawrakow
2025-04-07Add copyright notices (#317)Kawrakow
2025-04-01Additional guards for interleaved quants (#299)Kawrakow
2025-03-27Make sure tensor row size is multiple of block size also when quantizing with...Kawrakow