index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2025-07-20
Webui: New Features for Conversations, Settings, and Chat Messages (#618)
main
firecoperana
2025-07-20
Adding IQ1_KT - 1.75 bpw SOTA quants (#616)
Kawrakow
2025-07-20
IQ1_M GEMM for ARM_NEON (#631)
Kawrakow
2025-07-18
Remove forgotten change
Iwan Kawrakow
2025-07-18
GEMM for iq1_m (#630)
Kawrakow
2025-07-17
Add GGML_MAX_CONTEXTS definition in CMakeLists.txt (#622)
Thireus ☠
2025-07-17
Bump Windows max open files from 512 to 2048 (#620)
Thireus ☠
2025-07-16
Fixup kimi-k2 convert indentation (#617)
ubergarm
2025-07-16
Bump GGML_MAX_CONTEXTS to allow loading more shards (#611)
Thireus ☠
2025-07-15
kimi-k2 convert script and chat template (#612)
ubergarm
2025-07-15
Vulkan: a fresh start (#608)
Kawrakow
2025-07-14
Adding IQ2_KL (#602)
Kawrakow
2025-07-14
Ported kimi-k2 support from llama.cpp (#609)
Aleksey Nikiforov
2025-07-13
Add iq3_ks to constants.py (#606)
Kawrakow
2025-07-13
Fix attn_v conditionality (#604)
Nexes the Elder
2025-07-13
Check if MMQ should be used before using it (#603)
Kawrakow
2025-07-10
Support for dots.llm1 models (#573)
saood06
2025-07-10
CUDA: Faster prompt processing for several quantization types (#595)
Kawrakow
2025-07-09
add hunyuan moe support for 561 (#565)
ubergarm
2025-07-08
Faster prompt processing for IQ2_KS, IQ2_K, IQ2_K_R4 (#593)
Kawrakow
2025-07-07
CUDA: small PP performance improvement for MoE models (#589)
Kawrakow
2025-07-06
Special handling of Seed Coder FIM tokens (#585)
Fizz~
2025-07-06
Fix server crash when there is no DRY sampler (#588)
firecoperana
2025-07-05
Vulkan: flash attention for DeepSeek models (#584)
Kawrakow
2025-07-04
Adding forgotten file (#583)
Kawrakow
2025-07-04
Vulkan: adding GGML_OP_MULTI_ADD implementation (#582)
Kawrakow
2025-07-03
Vulkan: Disable multi-add for now (#581)
Kawrakow
2025-07-03
Vulkan: add GGML_OP_FUSED_MUL_UNARY (#580)
Kawrakow
2025-07-03
Vulkan: fused rms norm (#577)
Kawrakow
2025-07-03
Do not crash when there is no DRY sampler (#578)
Kawrakow
2025-07-03
Fix debug build failure with RPC off (#579)
Kawrakow
2025-07-03
Chnage KQ mask padding to 64 (#574)
Kawrakow
2025-07-02
Fix CMakeLists (#571)
Kawrakow
2025-07-02
Adding IQ3_KS quants (#566)
Kawrakow
2025-07-02
Minor CUDA PP speed improvement (#567)
Kawrakow
2025-07-02
Conditionally disable fused ops when building with Vulkan enabled (#569)
Kawrakow
2025-07-02
Merge vulkan code from mainline up to commit of 6/28/2025 (#563)
firecoperana
2025-06-27
Remove what appears to be unnecessary asserts in ggml_cuda_cpy (#560)
Kawrakow
2025-06-27
Use cuBLAS for large batches and quants with block size 16 (#559)
Kawrakow
2025-06-26
CUDA: MMQ for iqX_r4 quants (#557)
Kawrakow
2025-06-26
Add Falcon-Edge support (#555)
Kawrakow
2025-06-24
Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON (#553)
Kawrakow
2025-06-24
Much faster prompt processing for k-quants (ARM_NEON) (#552)
Kawrakow
2025-06-23
Much faster prompt processing for I-quants (ARM_NEON) (#550)
Kawrakow
2025-06-23
Much faster prompt processing for IQK quants (ARM_NEON) (#549)
Kawrakow
2025-06-22
To use GGML_ABORT we need to include ggml-impl.h.
Iwan Kawrakow
2025-06-22
Abort if IQK_IMPLEMENT is not defined
Iwan Kawrakow
2025-06-21
Faster ARM_NEON GEMM implementation for legacy quants (#546)
Kawrakow
2025-06-21
Perhaps slightly faster trellis quants (#541)
Kawrakow
2025-06-20
New integer trellis on ARM_NEON (#544)
Kawrakow
[next]