ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2024-10-26 10:59:59 +0200
committer	GitHub <noreply@github.com>	2024-10-26 10:59:59 +0200
commit	f7b05a09ddb2b2579f6301a6223d894f5b97c494 (patch)
tree	edd935f7838ab639a8f174d0dcbc30d96e3b154d /ggml/src/ggml-cuda/template-instances
parent	19cc3329bf00e2de2fd7377015c157d6733089b7 (diff)

Faster IQ1_BN Metal implementation (#107)

* iq1_bn: faster Metal dot product 82 t/s -> 87.9 t/s * iq1_bn(Metal): 87.9 -> 89.0 t/s for TG-128 * iq1_bn(Metal): 89.0 -> 94.7 t/s for TG-128 So, total improvement is ~15%. Not bad. * iq1_bn(Metal): 686 -> 702 t/s for PP-512 * iq2_bn(Metal): 710 -> 714 t/s for PP-512 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'ggml/src/ggml-cuda/template-instances')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: