ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-06-05 12:41:55 +0300
committer	Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-06-22 12:02:49 +0300
commit	e67626533cf8b1e63dcfb20c8279679fd6b91109 (patch)
tree	c71e82172b1ce887dc09e9b709b901561adb2e64 /sgemm.cpp
parent	47ae12bbec4eedf842ac067534e94bb1214ace73 (diff)

iqk_mul_mat: experimenting with zen4

Nope, we cannot have good performance for iq2_xxs and iq3_xxs at the same time. If I don't force inline the sign functions, I get better performnce for iq2_xxs and bad performance for iq3_xxs. If I fore inline them, it is the other way around. Anyway, this is what we have now on Zen4 for all quants with forced inline EvenSignHelper methods: | model | size | threads | test | t/s | | -----------------| ---------: | ------: | -----: | ------------: | | llama 7B IQ3_S | 2.75 GiB | 16 | pp512 | 100.91 ± 0.26 | | llama 7B IQ3_XXS | 2.41 GiB | 16 | pp512 | 106.08 ± 0.78 | | llama 7B IQ2_M | 2.20 GiB | 16 | pp512 | 116.41 ± 0.25 | | llama 7B IQ2_XS | 1.89 GiB | 16 | pp512 | 132.54 ± 1.07 | | llama 7B IQ2_XXS | 1.73 GiB | 16 | pp512 | 125.53 ± 0.06 | arithmetic mean: 116.29 geometric mean: 115.70 | -----------------| ---------: | ------: | -----: | ------------: | | llama 7B IQ3_S | 2.75 GiB | 8 | tg128 | 15.69 ± 0.04 | | llama 7B IQ3_XXS | 2.41 GiB | 8 | tg128 | 18.02 ± 0.04 | | llama 7B IQ2_M | 2.20 GiB | 8 | tg128 | 18.94 ± 0.03 | | llama 7B IQ2_XS | 1.89 GiB | 8 | tg128 | 23.29 ± 0.02 | | llama 7B IQ2_XXS | 1.73 GiB | 8 | tg128 | 22.96 ± 0.09 | arithmetic mean: 19.78 geometric mean: 19.56 Without force-inlining, PP(iq3_xxs) drops to 98 t/s while PP(iq2_xxs) increases to 137 t/s.

Diffstat (limited to 'sgemm.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: