summaryrefslogtreecommitdiff
path: root/examples/server/bench
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-05-29 10:38:58 +0300
committerIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-22 12:02:49 +0300
commit2c8c0d0a68d78f0aaf7c756849f97d0a5e655afe (patch)
tree725d1eee1babec05bafa0fd792ba0138b648d117 /examples/server/bench
parent34befcaf6731a9a29bb5d7f3f2472e53c4151898 (diff)
iqk_mul_mat: AVX2 implementation for iq3_xxs
We get 2.3X for PP-512 (87 t/s). But for TG, we need to use the original implementation in llama.cpp because the template is not able to match the performance of the special-purpose implementation. Also, 87 t/s is significantly lower than the 111 t/s I have in iquants.
Diffstat (limited to 'examples/server/bench')
0 files changed, 0 insertions, 0 deletions