diff options
author | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-10 09:53:26 +0300 |
---|---|---|
committer | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-22 12:02:50 +0300 |
commit | ae1e77c5dee9b513e0b710075ef6713ede821b3c (patch) | |
tree | fa8838c2e15bc5723503510d6e46f1da1473accf /sgemm.cpp | |
parent | 9386b499181a1d89c39e3a8114ef3255e9d52e63 (diff) |
iqk_mul_mat: better fp16 for AVX2
Basically use what I did for Arm.
Improves PP performance to 141.7 t/s up from 136 t/s
on the Ryzen-7950X (32 vector registers, so we use 5x5 tiling).
This is now 10% faster than tinyBLAS.
There is a minor improvement also on the Ryzen-5975WX
(16 vector registers, so we use 4x3 tiling): we get
138 t/s up from 136 t/s. tinyBLAS is at 132 t/s.
Diffstat (limited to 'sgemm.cpp')
0 files changed, 0 insertions, 0 deletions