index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml
/
src
Age
Commit message (
Expand
)
Author
2024-10-19
Attempt to blindly fix Windows build failure (#93)
Kawrakow
2024-10-16
Adding IQ4_KSS: 4.0 bpw quants (#89)
Kawrakow
2024-10-16
iq4_ks: faster dot product on Metal (#90)
Kawrakow
2024-10-14
Minor iq3_k tweak
Iwan Kawrakow
2024-10-14
iq3_k: fix and optimize Metal dot product (#87)
Kawrakow
2024-10-13
Fix and optimize iq2k Metal implementation (#86)
Kawrakow
2024-10-13
IQ2_KS: 2.1875 bpw non-linear quantization (#85)
Kawrakow
2024-10-09
New SOTA quantization: 4.25 bpw IQ4_KS (#83)
Kawrakow
2024-10-04
Fix compiler warnings
Iwan Kawrakow
2024-10-04
Move scale fudge factors to quantization (#81)
Kawrakow
2024-10-04
Move to c++17 projectwide (#80)
Kawrakow
2024-10-04
Do not quantize activations if not necessary (#79)
Kawrakow
2024-10-02
q6_0: Slightly faster Zen4/AVX2 (#78)
Kawrakow
2024-10-02
Fused unary(x)*y (#70)
Kawrakow
2024-10-02
Adding Q6_0 (#77)
Kawrakow
2024-10-02
iq4_nl: faster quantization (#76)
Kawrakow
2024-10-01
Fix Q5_0 flash attention (#75)
Kawrakow
2024-10-01
Fix last commit
Iwan Kawrakow
2024-10-01
IQ4_NL kv-cache on the CPU (Zen4/AVX2/ARM_NEON) (#74)
Kawrakow
2024-10-01
CUDA: faster float -> iq4_nl conversion (#73)
Kawrakow
2024-10-01
iqk_mul_mat: better iq4_nl implementation on Zen4/AVX2 (#72)
Kawrakow
2024-10-01
iqk_mul_mat: better srategy when nrc_y not divisible by ny (#71)
Kawrakow
2024-09-29
Allow bf16 kv-cache (#69)
Kawrakow
2024-09-28
CUDA non-contiguous RoPE (#66)
Kawrakow
2024-09-28
Adding SWIGLU unary op (#65)
Kawrakow
2024-09-27
Adding ability to have meta data per tensor row (#61)
Kawrakow
2024-09-25
Use fp32 for K*Q in Metal FA implementation (#62)
Kawrakow
2024-09-17
Fix compiler warnings (#58)
Kawrakow
2024-09-17
BF16 support on Metal (#56)
Kawrakow
2024-09-16
iqk_mul_mat(ARM_NEON): adding bf16 support (#41)
Kawrakow
2024-09-15
Minor
Iwan Kawrakow
2024-09-14
Adding bf16 support to CUDA (#40)
Kawrakow
2024-09-14
Improve Q5_0 performance (#55)
Kawrakow
2024-09-14
Improve Q4_0 and Q8_0 performance on AVX2/Zen4 (#54)
Kawrakow
2024-09-13
Minor
Iwan Kawrakow
2024-09-13
Fix bug and D < 128 case for Q8_0 k-cache (#52)
Kawrakow
2024-09-12
Quantized Flash Attention for all supported CPU platforms (#51)
Kawrakow
2024-09-11
AVX2 Flash Attention 2 (#50)
Kawrakow
2024-09-11
ARM_NEON Flash Attention (#49)
Kawrakow
2024-09-10
AVX2 Flash Attention (#48)
Kawrakow
2024-09-10
iq2_tn: slightly better performance on AVX2 (#47)
Kawrakow
2024-09-10
IQ1_TN Metal implementation (#46)
Kawrakow
2024-09-09
Add CUDA support for IQ1_TN (#45)
Kawrakow
2024-09-09
Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)
Kawrakow
2024-09-08
iq2_tn: slightly faster PP (#43)
Kawrakow
2024-09-08
Adding fused rms_norm (#42)
Kawrakow
2024-09-05
Add support for bf16 to iqk_mul_mat (#39)
Kawrakow
2024-09-05
Zen4 Flash Attention - bf16 support (#38)
Kawrakow
2024-09-04
Performance improvements for legacy quants on ARM_NEON (#37)
Kawrakow
2024-09-04
Zen4 Flash Attnetion 2 (#36)
Kawrakow
[next]