diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-01-30 09:28:53 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-01-30 09:28:53 +0200 |
commit | 2e6b523853a8659c63283a6deca805051ecd713a (patch) | |
tree | 2b64156fa1c7004403a070efb030cf4a61805825 /ggml/src/ggml-backend.c | |
parent | 4a73c250023a74bb1665875bbced7f1a3857b7f6 (diff) |
Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182)
* Slightly faster AVX2 implementation for q4_k_r4
* Even better AVX2 implementation for q4_k_r4
We now arrive at PP-512 = 328 t/s for LLaMA-3.1-8B on a
Ryzen-5975WX CPU, up from 291 t/s when I last measured
on 3c5f8722.
With FA and Q8_0 K-cache we get to 339.5 t/s.
* Fix llama-bench labels that I broke with #181
* Faster AVX2 implementation for q5_k_q4
We arrive at 302 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU,
up from 273 t/s.
* Use AVX2 implementation of q4_k_r4 and q5_k_r4 also on Zen4
After the changes I made to AVX2, it ends up being slightly faster
compared to what I had for Zen4.
* Minor tweak
* Cleanup
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-backend.c')
0 files changed, 0 insertions, 0 deletions