ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-06-10 09:53:26 +0300
committer	Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-06-22 12:02:50 +0300
commit	ae1e77c5dee9b513e0b710075ef6713ede821b3c (patch)
tree	fa8838c2e15bc5723503510d6e46f1da1473accf /ggml-quants.c
parent	9386b499181a1d89c39e3a8114ef3255e9d52e63 (diff)

iqk_mul_mat: better fp16 for AVX2

Basically use what I did for Arm. Improves PP performance to 141.7 t/s up from 136 t/s on the Ryzen-7950X (32 vector registers, so we use 5x5 tiling). This is now 10% faster than tinyBLAS. There is a minor improvement also on the Ryzen-5975WX (16 vector registers, so we use 4x3 tiling): we get 138 t/s up from 136 t/s. tinyBLAS is at 132 t/s.

Diffstat (limited to 'ggml-quants.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: