index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-09-04
Performance improvements for legacy quants on ARM_NEON (#37)
Kawrakow
2024-09-04
Zen4 Flash Attnetion 2 (#36)
Kawrakow
2024-09-02
Fix Zen4 Flash Attention (#35)
Kawrakow
2024-09-02
Do not process prompts containing binary data for escapes (#33)
Kawrakow
2024-09-01
Zen4 Flash Attention (#32)
Kawrakow
2024-08-31
Fix build when iqk_mul_mat is disabled (#31)
Kawrakow
2024-08-27
Faster Gemma2 (#27)
Kawrakow
2024-08-21
softcap: minor improvement (#24)
Kawrakow
2024-08-20
Fused soft cap and SIMD-ified GeLU (#9)
Kawrakow
2024-08-20
iq4_k: use iq5_k also when n_gqa = 2 (#23)
Kawrakow
2024-08-19
AVX2 quantization for Q8_K (#22)
Kawrakow
2024-08-19
quantize_stats: print rmse and max error as fraction of <x> (#21)
Kawrakow
2024-08-19
iq2_k: slightly better bpw - accuracy compromise (#20)
Kawrakow
2024-08-14
Skip barriers of noops (#19)
Kawrakow
2024-08-12
Update README.md
Kawrakow
2024-08-12
Merge mainline - Aug 12 2024 (#17)
Kawrakow
2024-08-09
Fix Makefile
Iwan Kawrakow
2024-08-09
Fix Zen4 implementation of iq3_k, iq4_k, iq5_k
Iwan Kawrakow
2024-08-09
iq6_k: AVX2
Iwan Kawrakow
2024-08-09
iq6_k: Metal
Iwan Kawrakow
2024-08-09
iq6_k: NEON
Iwan Kawrakow
2024-08-09
iq6_k: slightly better Zen4 iqk_mul_mat
Iwan Kawrakow
2024-08-09
iq6_k: Zen4 iqk_mul_mat
Iwan Kawrakow
2024-08-09
iq6_k: CUDA dot product
Iwan Kawrakow
2024-08-09
iq6_k: CUDA dequantize
Iwan Kawrakow
2024-08-09
iq6_k: WIP (quantize/dequantize)
Iwan Kawrakow
2024-08-09
iq6_k: WIP (nothing works)
Iwan Kawrakow
2024-08-07
Adding IQ2_TN for use with ternary models (#13)
Kawrakow
2024-08-05
q2_K: allow it to detect ternary nets and quantize accordingly
Iwan Kawrakow
2024-08-05
Update README.md
Kawrakow
2024-08-05
iq3_k, iq5_k: faster quantization
Iwan Kawrakow
2024-08-03
iq4_k: speedup quantization by a factor of ~2
Iwan Kawrakow
2024-08-01
Add copyright notice
Iwan Kawrakow
2024-08-01
iq2/3_k: tiny bit faster Metal dot products
Iwan Kawrakow
2024-08-01
iq3_k: slightly faster Metal dequantize kernel
Iwan Kawrakow
2024-08-01
iq3_k: Metal dot product
Iwan Kawrakow
2024-08-01
iq2_k: Metal dot product finally works
Iwan Kawrakow
2024-08-01
iq3_k: Metal dequantize
Iwan Kawrakow
2024-08-01
iq3_k: NEON
Iwan Kawrakow
2024-08-01
iq3_k: AVX2 iqk_mul_mat
Iwan Kawrakow
2024-08-01
iq3_k: AVX512 iqk_mul_mat
Iwan Kawrakow
2024-08-01
iq3_k: faster CUDA dot product
Iwan Kawrakow
2024-08-01
iq3_k: CUDA dot product
Iwan Kawrakow
2024-08-01
iq3_k: Basics
Iwan Kawrakow
2024-08-01
iq2_k: very slightly better CUDA dot product
Iwan Kawrakow
2024-08-01
iq2_k: better CUDA dot product
Iwan Kawrakow
2024-08-01
iq2_k: CUDA dot product finally works
Iwan Kawrakow
2024-08-01
iq5_k: CUDA dot product finally works
Iwan Kawrakow
2024-08-01
Factor out iqk CUDA dot products
Iwan Kawrakow
2024-08-01
iq5_k: CUDA dot product still not working
Iwan Kawrakow
[next]