summaryrefslogtreecommitdiff
path: root/examples/server/utils.hpp
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-07-30 18:40:10 +0300
committerKawrakow <48489457+ikawrakow@users.noreply.github.com>2024-08-01 09:38:06 +0200
commit9d0cf7a399b1cf870fc25f7a63d5b0c6af020604 (patch)
treedaff541ab76575ab762ca643a47001ec94c548fd /examples/server/utils.hpp
parentfd1ae85a329e8148d1de20dc6ef5302110d53b73 (diff)
iq3_k: AVX512 iqk_mul_mat
We get PP-512 = 180 t/s, TG-128(4 threads) = 16.35 on the Ryzen-7950X for LLaMA-3.1-8B. In comparison, iq3_s has PP-512 = 96 t/s, TG-128 = 7.6 t/s with iqk_mul_mat, and PP-512 = 28 t/s, TG-128 = 6.8 t/s in mainline llama.cpp
Diffstat (limited to 'examples/server/utils.hpp')
0 files changed, 0 insertions, 0 deletions