diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2024-10-26 10:59:59 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-10-26 10:59:59 +0200 |
commit | f7b05a09ddb2b2579f6301a6223d894f5b97c494 (patch) | |
tree | edd935f7838ab639a8f174d0dcbc30d96e3b154d /ggml/src/ggml-cuda/template-instances | |
parent | 19cc3329bf00e2de2fd7377015c157d6733089b7 (diff) |
Faster IQ1_BN Metal implementation (#107)
* iq1_bn: faster Metal dot product
82 t/s -> 87.9 t/s
* iq1_bn(Metal): 87.9 -> 89.0 t/s for TG-128
* iq1_bn(Metal): 89.0 -> 94.7 t/s for TG-128
So, total improvement is ~15%. Not bad.
* iq1_bn(Metal): 686 -> 702 t/s for PP-512
* iq2_bn(Metal): 710 -> 714 t/s for PP-512
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-cuda/template-instances')
0 files changed, 0 insertions, 0 deletions