diff options
author | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-07-30 18:40:10 +0300 |
---|---|---|
committer | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-08-01 09:38:06 +0200 |
commit | 9d0cf7a399b1cf870fc25f7a63d5b0c6af020604 (patch) | |
tree | daff541ab76575ab762ca643a47001ec94c548fd /examples/server/tests | |
parent | fd1ae85a329e8148d1de20dc6ef5302110d53b73 (diff) |
iq3_k: AVX512 iqk_mul_mat
We get PP-512 = 180 t/s, TG-128(4 threads) = 16.35 on the Ryzen-7950X
for LLaMA-3.1-8B.
In comparison, iq3_s has PP-512 = 96 t/s, TG-128 = 7.6 t/s with
iqk_mul_mat, and PP-512 = 28 t/s, TG-128 = 6.8 t/s in mainline llama.cpp
Diffstat (limited to 'examples/server/tests')
0 files changed, 0 insertions, 0 deletions