summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2025-07-20Webui: New Features for Conversations, Settings, and Chat Messages (#618)mainfirecoperana
2025-07-20Adding IQ1_KT - 1.75 bpw SOTA quants (#616)Kawrakow
2025-07-20IQ1_M GEMM for ARM_NEON (#631)Kawrakow
2025-07-18Remove forgotten changeIwan Kawrakow
2025-07-18GEMM for iq1_m (#630)Kawrakow
2025-07-17Add GGML_MAX_CONTEXTS definition in CMakeLists.txt (#622)Thireus ☠
2025-07-17Bump Windows max open files from 512 to 2048 (#620)Thireus ☠
2025-07-16Fixup kimi-k2 convert indentation (#617)ubergarm
2025-07-16Bump GGML_MAX_CONTEXTS to allow loading more shards (#611)Thireus ☠
2025-07-15kimi-k2 convert script and chat template (#612)ubergarm
2025-07-15Vulkan: a fresh start (#608)Kawrakow
2025-07-14Adding IQ2_KL (#602)Kawrakow
2025-07-14Ported kimi-k2 support from llama.cpp (#609)Aleksey Nikiforov
2025-07-13Add iq3_ks to constants.py (#606)Kawrakow
2025-07-13Fix attn_v conditionality (#604)Nexes the Elder
2025-07-13Check if MMQ should be used before using it (#603)Kawrakow
2025-07-10Support for dots.llm1 models (#573)saood06
2025-07-10CUDA: Faster prompt processing for several quantization types (#595)Kawrakow
2025-07-09add hunyuan moe support for 561 (#565)ubergarm
2025-07-08Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593)Kawrakow
2025-07-07CUDA: small PP performance improvement for MoE models (#589)Kawrakow
2025-07-06Special handling of Seed Coder FIM tokens (#585)Fizz~
2025-07-06Fix server crash when there is no DRY sampler (#588)firecoperana
2025-07-05Vulkan: flash attention for DeepSeek models (#584)Kawrakow
2025-07-04Adding forgotten file (#583)Kawrakow
2025-07-04Vulkan: adding GGML_OP_MULTI_ADD implementation (#582)Kawrakow
2025-07-03Vulkan: Disable multi-add for now (#581)Kawrakow
2025-07-03Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580)Kawrakow
2025-07-03Vulkan: fused rms norm (#577)Kawrakow
2025-07-03Do not crash when there is no DRY sampler (#578)Kawrakow
2025-07-03Fix debug build failure with RPC off (#579)Kawrakow
2025-07-03Chnage KQ mask padding to 64 (#574)Kawrakow
2025-07-02Fix CMakeLists (#571)Kawrakow
2025-07-02Adding IQ3_KS quants (#566)Kawrakow
2025-07-02Minor CUDA PP speed improvement (#567)Kawrakow
2025-07-02Conditionally disable fused ops when building with Vulkan enabled (#569)Kawrakow
2025-07-02Merge vulkan code from mainline up to commit of 6/28/2025 (#563)firecoperana
2025-06-27Remove what appears to be unnecessary asserts in ggml_cuda_cpy (#560)Kawrakow
2025-06-27Use cuBLAS for large batches and quants with block size 16 (#559)Kawrakow
2025-06-26CUDA: MMQ for iqX_r4 quants (#557)Kawrakow
2025-06-26Add Falcon-Edge support (#555)Kawrakow
2025-06-24Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON (#553)Kawrakow
2025-06-24Much faster prompt processing for k-quants (ARM_NEON) (#552)Kawrakow
2025-06-23Much faster prompt processing for I-quants (ARM_NEON) (#550)Kawrakow
2025-06-23Much faster prompt processing for IQK quants (ARM_NEON) (#549)Kawrakow
2025-06-22To use GGML_ABORT we need to include ggml-impl.h.Iwan Kawrakow
2025-06-22Abort if IQK_IMPLEMENT is not definedIwan Kawrakow
2025-06-21Faster ARM_NEON GEMM implementation for legacy quants (#546)Kawrakow
2025-06-21Perhaps slightly faster trellis quants (#541)Kawrakow
2025-06-20New integer trellis on ARM_NEON (#544)Kawrakow