summaryrefslogtreecommitdiff
path: root/ggml/src/ggml-cuda
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2024-10-26 10:59:59 +0200
committerGitHub <noreply@github.com>2024-10-26 10:59:59 +0200
commitf7b05a09ddb2b2579f6301a6223d894f5b97c494 (patch)
treeedd935f7838ab639a8f174d0dcbc30d96e3b154d /ggml/src/ggml-cuda
parent19cc3329bf00e2de2fd7377015c157d6733089b7 (diff)
Faster IQ1_BN Metal implementation (#107)
* iq1_bn: faster Metal dot product 82 t/s -> 87.9 t/s * iq1_bn(Metal): 87.9 -> 89.0 t/s for TG-128 * iq1_bn(Metal): 89.0 -> 94.7 t/s for TG-128 So, total improvement is ~15%. Not bad. * iq1_bn(Metal): 686 -> 702 t/s for PP-512 * iq2_bn(Metal): 710 -> 714 t/s for PP-512 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-cuda')
0 files changed, 0 insertions, 0 deletions