summaryrefslogtreecommitdiff
path: root/ggml/src
AgeCommit message (Expand)Author
2025-07-20Adding IQ1_KT - 1.75 bpw SOTA quants (#616)Kawrakow
2025-07-20IQ1_M GEMM for ARM_NEON (#631)Kawrakow
2025-07-18Remove forgotten changeIwan Kawrakow
2025-07-18GEMM for iq1_m (#630)Kawrakow
2025-07-17Add GGML_MAX_CONTEXTS definition in CMakeLists.txt (#622)Thireus ☠
2025-07-15Vulkan: a fresh start (#608)Kawrakow
2025-07-14Adding IQ2_KL (#602)Kawrakow
2025-07-13Check if MMQ should be used before using it (#603)Kawrakow
2025-07-10CUDA: Faster prompt processing for several quantization types (#595)Kawrakow
2025-07-08Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593)Kawrakow
2025-07-07CUDA: small PP performance improvement for MoE models (#589)Kawrakow
2025-07-05Vulkan: flash attention for DeepSeek models (#584)Kawrakow
2025-07-04Adding forgotten file (#583)Kawrakow
2025-07-04Vulkan: adding GGML_OP_MULTI_ADD implementation (#582)Kawrakow
2025-07-03Vulkan: Disable multi-add for now (#581)Kawrakow
2025-07-03Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580)Kawrakow
2025-07-03Vulkan: fused rms norm (#577)Kawrakow
2025-07-03Fix debug build failure with RPC off (#579)Kawrakow
2025-07-02Fix CMakeLists (#571)Kawrakow
2025-07-02Adding IQ3_KS quants (#566)Kawrakow
2025-07-02Minor CUDA PP speed improvement (#567)Kawrakow
2025-07-02Merge vulkan code from mainline up to commit of 6/28/2025 (#563)firecoperana
2025-06-27Remove what appears to be unnecessary asserts in ggml_cuda_cpy (#560)Kawrakow
2025-06-27Use cuBLAS for large batches and quants with block size 16 (#559)Kawrakow
2025-06-26CUDA: MMQ for iqX_r4 quants (#557)Kawrakow
2025-06-24Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON (#553)Kawrakow
2025-06-24Much faster prompt processing for k-quants (ARM_NEON) (#552)Kawrakow
2025-06-23Much faster prompt processing for I-quants (ARM_NEON) (#550)Kawrakow
2025-06-23Much faster prompt processing for IQK quants (ARM_NEON) (#549)Kawrakow
2025-06-22To use GGML_ABORT we need to include ggml-impl.h.Iwan Kawrakow
2025-06-22Abort if IQK_IMPLEMENT is not definedIwan Kawrakow
2025-06-21Faster ARM_NEON GEMM implementation for legacy quants (#546)Kawrakow
2025-06-21Perhaps slightly faster trellis quants (#541)Kawrakow
2025-06-20New integer trellis on ARM_NEON (#544)Kawrakow
2025-06-19Fix NEON build (#542)Kawrakow
2025-06-19Update CMakeLists.txt to fix NDEBUG handling (#537)Anton Sokolchenko
2025-06-19Fix missed block_q8_x2 bf16 -> i16 change (#540)Kawrakow
2025-06-18Fix KT Neon / ARM typo (#536)Louie Helm
2025-06-18Fix MSVC compilation errorIwan Kawrakow
2025-06-18New IQ2_KT, IQ3_KT and IQ4_KT, V2 (#529)Kawrakow
2025-06-18Much faster CPU prompt processing (part 3) (#534)Kawrakow
2025-06-18Much faster CPU prompt processing (part 2) (#533)Kawrakow
2025-06-17Much faster CPU prompt processing (part 1) (#531)Kawrakow
2025-06-14Call iqk_convert_repack in MoE GEMM (#528)Kawrakow
2025-06-13Faster CPU prompt processing for Q4_K and Q5_K (#525)Kawrakow
2025-06-13Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV (#524)Kawrakow
2025-06-12Better strategy for GPU offload (#520)Kawrakow
2025-06-12iq3_s: much faster GEMM via repacking to q8_0_r8 (#518)Kawrakow
2025-06-11Faster iq1_s GEMM via repacking to Q8_0_R8 (#517)Kawrakow
2025-06-11Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2) (#516)Kawrakow