ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-18 13:55:51 +0200
committer	Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-18 13:55:51 +0200
commit	30b8bcf1a3bf232aabcbb826c7a2769dda6eafa0 (patch)
tree	40d4e4eb9274afcef5a751999e82cd7011b1ffe4 /llama.cpp
parent	8db01c0804b603cb76bbee82ebb1a144c8d3592e (diff)

iqk_mul_mat(f16): make it work for row sizes that are multiple of 4 on NEON

Here the performance gain is more modest compared to AVX2: we get PP-512 = 200 t/s up from 190 t/s for iq1_bn-quantized Bitnet-3B running on M2 Max.

Diffstat (limited to 'llama.cpp')

0 files changed, 0 insertions, 0 deletions