summaryrefslogtreecommitdiff
path: root/ggml/src/ggml-backend.c
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-01-30 09:28:53 +0200
committerGitHub <noreply@github.com>2025-01-30 09:28:53 +0200
commit2e6b523853a8659c63283a6deca805051ecd713a (patch)
tree2b64156fa1c7004403a070efb030cf4a61805825 /ggml/src/ggml-backend.c
parent4a73c250023a74bb1665875bbced7f1a3857b7f6 (diff)
Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182)
* Slightly faster AVX2 implementation for q4_k_r4 * Even better AVX2 implementation for q4_k_r4 We now arrive at PP-512 = 328 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 291 t/s when I last measured on 3c5f8722. With FA and Q8_0 K-cache we get to 339.5 t/s. * Fix llama-bench labels that I broke with #181 * Faster AVX2 implementation for q5_k_q4 We arrive at 302 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 273 t/s. * Use AVX2 implementation of q4_k_r4 and q5_k_r4 also on Zen4 After the changes I made to AVX2, it ends up being slightly faster compared to what I had for Zen4. * Minor tweak * Cleanup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-backend.c')
0 files changed, 0 insertions, 0 deletions