index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml
/
src
/
ggml-cuda
Age
Commit message (
Expand
)
Author
2025-06-05
IQ1_M_R4 CUDA implementation (#494)
Kawrakow
2025-06-05
MMQ implementation for IQ4_KS_R4 and IQ5_KS_R4 (#493)
Kawrakow
2025-06-05
CUDA implementation for IQ1_S_R4 (#492)
Kawrakow
2025-06-01
Minor (~2%) iq2_ks TG performance improvement on CUDA (#468)
Kawrakow
2025-05-27
CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 (#462)
Kawrakow
2025-05-26
CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461)
Kawrakow
2025-05-24
Legacy quants conversion schemes in convert_hf_to_gguf.py (#449)
Nexes the Elder
2025-05-23
Fix bug in MMVQ kernel (#446)
Kawrakow
2025-05-23
Trellis quants with CPU inference (#441)
Andrew Chan
2025-05-20
Bug fixes from mainline (#439)
Kawrakow
2025-05-18
Forgotten MMQ ref and typo (#431)
Nexes the Elder
2025-05-15
Adding forgotten template instance for iq5_ks (#424)
Kawrakow
2025-05-15
Adding IQ5_KS - 5.25 bpw quants (#422)
Kawrakow
2025-05-15
CUDA: quantized GEMM for for IQ2_KS, IQ2_K, IQ3_K (#418)
Kawrakow
2025-05-14
CUDA: quantized GEMM for for IQ4_K, IQ5_K, IQ6_K (#417)
Kawrakow
2025-05-14
Fix SER (CUDA) (#416)
Kawrakow
2025-05-12
Fix new CUDA FA on Touring (#413)
Kawrakow
2025-05-12
Faster DeepSeek FA on CUDA (#408)
Kawrakow
2025-05-11
Revert "Fix race in the CUDA DeepSeek FA kernel (#406)"
Iwan Kawrakow
2025-05-11
Fix race in the CUDA DeepSeek FA kernel (#406)
Kawrakow
2025-05-10
TG improvements for MoE models (#404)
Kawrakow
2025-05-09
Fix CUDA FlashMLA-3 with quantized KV cache (#400)
Kawrakow
2025-05-07
FlashMLA-3 for DeepSeek models on CUDA (#386)
Kawrakow
2025-05-05
Fix DeepSeek FA (#382)
Kawrakow
2025-05-04
CUDA: MMQ for IQ4_KS (#374)
Kawrakow
2025-05-04
CUDA: faster FA TG for GQA models (#370)
Kawrakow
2025-04-24
cuda: use switch in constexpr funcs (#343)
Kawrakow
2025-04-15
Allow q8_0 KV cache for head size 256 (#330)
Kawrakow
2025-04-07
Add copyright notices (#317)
Kawrakow
2025-03-18
Make Q8_0 KV cache work with mla=2,fa on CUDA (#264)
Kawrakow
2025-03-18
Compile time option to use bf16 for qunts without MMQ kernels (#261)
Kawrakow
2025-03-18
FlashMLA-2: reduce compute buffer size (CUDA and CPU) (#260)
Kawrakow
2025-03-12
MLA-2: Allow usage of q8_0 for KV cache on CUDA (#252)
Kawrakow
2025-03-10
DeepSeek imatrix stuff (#250)
Kawrakow
2025-03-10
Faster MoE token generation on CUDA (#248)
Kawrakow
2025-03-05
DeepSeek CUDA Flash Attention (#241)
Kawrakow
2025-03-02
SER - Smart Expert Reduction (#239)
Kawrakow
2025-03-01
Reduce size of compute buffers (#237)
Kawrakow
2025-02-27
Option to use MLA without a transposed cache (#235)
Kawrakow
2025-02-27
Faster MLA on CUDA (#234)
Kawrakow
2025-02-23
Fused MoE ffn_up and ffn_gate (#229)
Kawrakow
2025-02-07
cuda: non-contiguous rms norm (#190)
Kawrakow
2024-11-21
MMQ for Q6_0 (#115)
Kawrakow
2024-10-31
Faster MoE inference (#112)
Kawrakow
2024-10-26
Bitnet CUDA improvements (#109)
Kawrakow
2024-10-25
Bitnet changes (#106)
Kawrakow
2024-10-24
Fix quantized k-cache without FA (#105)
Kawrakow
2024-10-22
Enable q6_0 for flash attention (#101)
Kawrakow
2024-10-21
Enable IQ4_NL for KV-cache in token generation using Flash Attention (#99)
Kawrakow
2024-10-16
Adding IQ4_KSS: 4.0 bpw quants (#89)
Kawrakow
[next]