diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-06-23 11:55:50 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-06-23 11:55:50 +0200 |
commit | 4776dd280976784eb0abd743186cc30370104b78 (patch) | |
tree | da92dccc81dcc3ea27073c69947b661b54490793 /ggml/src/ggml-rpc.cpp | |
parent | cac763fc2086685ae535e89f94b728c6081e8aac (diff) |
Much faster prompt processing for IQK quants (ARM_NEON) (#549)
* Faster GEMM fir iq2_ks, iq4_ks
* iq5_ks
63.8 t/s -> 166 t/s. iq5_ks_r4 is at 107.4 t/s.
But: iw5_ks_r4 TG performance is quite a bit better:
21.7 t/s vs 17.7 t/s for iq5_ks.
* iq6_k
44 t/s -> 164.3 t/s. There is no iq6_k_r4
* iq5_k
46 t/s -> 167 t/s. iq5_k_r4 is at 99.5 t/s.
* iq4_k
46.4 -> 167.2 t/s. iq4_k_r4 is at 115 t/s.
* iq3_k
47.3 t/s -> 166.5 t/s. iq3_k_r4 is at 96.5 t/s.
* iq2_k
47.4 t/s -> 167 t/s. iq2_k_r4 is at 113.3 t/s.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-rpc.cpp')
0 files changed, 0 insertions, 0 deletions