summaryrefslogtreecommitdiff
path: root/llama.cpp
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-07-18 13:55:51 +0200
committerIwan Kawrakow <iwan.kawrakow@gmail.com>2024-07-18 13:55:51 +0200
commit30b8bcf1a3bf232aabcbb826c7a2769dda6eafa0 (patch)
tree40d4e4eb9274afcef5a751999e82cd7011b1ffe4 /llama.cpp
parent8db01c0804b603cb76bbee82ebb1a144c8d3592e (diff)
iqk_mul_mat(f16): make it work for row sizes that are multiple of 4 on NEON
Here the performance gain is more modest compared to AVX2: we get PP-512 = 200 t/s up from 190 t/s for iq1_bn-quantized Bitnet-3B running on M2 Max.
Diffstat (limited to 'llama.cpp')
0 files changed, 0 insertions, 0 deletions