summaryrefslogtreecommitdiff
path: root/ggml-quants.c
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-10 09:53:26 +0300
committerIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-22 12:02:50 +0300
commitae1e77c5dee9b513e0b710075ef6713ede821b3c (patch)
treefa8838c2e15bc5723503510d6e46f1da1473accf /ggml-quants.c
parent9386b499181a1d89c39e3a8114ef3255e9d52e63 (diff)
iqk_mul_mat: better fp16 for AVX2
Basically use what I did for Arm. Improves PP performance to 141.7 t/s up from 136 t/s on the Ryzen-7950X (32 vector registers, so we use 5x5 tiling). This is now 10% faster than tinyBLAS. There is a minor improvement also on the Ryzen-5975WX (16 vector registers, so we use 4x3 tiling): we get 138 t/s up from 136 t/s. tinyBLAS is at 132 t/s.
Diffstat (limited to 'ggml-quants.c')
0 files changed, 0 insertions, 0 deletions