index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml
/
src
Age
Commit message (
Expand
)
Author
2025-07-20
Adding IQ1_KT - 1.75 bpw SOTA quants (#616)
Kawrakow
2025-07-20
IQ1_M GEMM for ARM_NEON (#631)
Kawrakow
2025-07-18
Remove forgotten change
Iwan Kawrakow
2025-07-18
GEMM for iq1_m (#630)
Kawrakow
2025-07-17
Add GGML_MAX_CONTEXTS definition in CMakeLists.txt (#622)
Thireus ☠
2025-07-15
Vulkan: a fresh start (#608)
Kawrakow
2025-07-14
Adding IQ2_KL (#602)
Kawrakow
2025-07-13
Check if MMQ should be used before using it (#603)
Kawrakow
2025-07-10
CUDA: Faster prompt processing for several quantization types (#595)
Kawrakow
2025-07-08
Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593)
Kawrakow
2025-07-07
CUDA: small PP performance improvement for MoE models (#589)
Kawrakow
2025-07-05
Vulkan: flash attention for DeepSeek models (#584)
Kawrakow
2025-07-04
Adding forgotten file (#583)
Kawrakow
2025-07-04
Vulkan: adding GGML_OP_MULTI_ADD implementation (#582)
Kawrakow
2025-07-03
Vulkan: Disable multi-add for now (#581)
Kawrakow
2025-07-03
Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580)
Kawrakow
2025-07-03
Vulkan: fused rms norm (#577)
Kawrakow
2025-07-03
Fix debug build failure with RPC off (#579)
Kawrakow
2025-07-02
Fix CMakeLists (#571)
Kawrakow
2025-07-02
Adding IQ3_KS quants (#566)
Kawrakow
2025-07-02
Minor CUDA PP speed improvement (#567)
Kawrakow
2025-07-02
Merge vulkan code from mainline up to commit of 6/28/2025 (#563)
firecoperana
2025-06-27
Remove what appears to be unnecessary asserts in ggml_cuda_cpy (#560)
Kawrakow
2025-06-27
Use cuBLAS for large batches and quants with block size 16 (#559)
Kawrakow
2025-06-26
CUDA: MMQ for iqX_r4 quants (#557)
Kawrakow
2025-06-24
Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON (#553)
Kawrakow
2025-06-24
Much faster prompt processing for k-quants (ARM_NEON) (#552)
Kawrakow
2025-06-23
Much faster prompt processing for I-quants (ARM_NEON) (#550)
Kawrakow
2025-06-23
Much faster prompt processing for IQK quants (ARM_NEON) (#549)
Kawrakow
2025-06-22
To use GGML_ABORT we need to include ggml-impl.h.
Iwan Kawrakow
2025-06-22
Abort if IQK_IMPLEMENT is not defined
Iwan Kawrakow
2025-06-21
Faster ARM_NEON GEMM implementation for legacy quants (#546)
Kawrakow
2025-06-21
Perhaps slightly faster trellis quants (#541)
Kawrakow
2025-06-20
New integer trellis on ARM_NEON (#544)
Kawrakow
2025-06-19
Fix NEON build (#542)
Kawrakow
2025-06-19
Update CMakeLists.txt to fix NDEBUG handling (#537)
Anton Sokolchenko
2025-06-19
Fix missed block_q8_x2 bf16 -> i16 change (#540)
Kawrakow
2025-06-18
Fix KT Neon / ARM typo (#536)
Louie Helm
2025-06-18
Fix MSVC compilation error
Iwan Kawrakow
2025-06-18
New IQ2_KT, IQ3_KT and IQ4_KT, V2 (#529)
Kawrakow
2025-06-18
Much faster CPU prompt processing (part 3) (#534)
Kawrakow
2025-06-18
Much faster CPU prompt processing (part 2) (#533)
Kawrakow
2025-06-17
Much faster CPU prompt processing (part 1) (#531)
Kawrakow
2025-06-14
Call iqk_convert_repack in MoE GEMM (#528)
Kawrakow
2025-06-13
Faster CPU prompt processing for Q4_K and Q5_K (#525)
Kawrakow
2025-06-13
Perhaps a slightly better version for IQ2_XXS, IQ3_XXS, IQ3_S GEMV (#524)
Kawrakow
2025-06-12
Better strategy for GPU offload (#520)
Kawrakow
2025-06-12
iq3_s: much faster GEMM via repacking to q8_0_r8 (#518)
Kawrakow
2025-06-11
Faster iq1_s GEMM via repacking to Q8_0_R8 (#517)
Kawrakow
2025-06-11
Much faster iq3_xxs GEMM via repacking to q8_0_r8 (AVX2) (#516)
Kawrakow
[next]