diff options
author | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-05 12:41:55 +0300 |
---|---|---|
committer | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-22 12:02:49 +0300 |
commit | e67626533cf8b1e63dcfb20c8279679fd6b91109 (patch) | |
tree | c71e82172b1ce887dc09e9b709b901561adb2e64 /sgemm.cpp | |
parent | 47ae12bbec4eedf842ac067534e94bb1214ace73 (diff) |
iqk_mul_mat: experimenting with zen4
Nope, we cannot have good performance for iq2_xxs and
iq3_xxs at the same time. If I don't force inline
the sign functions, I get better performnce for iq2_xxs
and bad performance for iq3_xxs. If I fore inline them,
it is the other way around. Anyway, this is what we have
now on Zen4 for all quants with forced inline EvenSignHelper
methods:
| model | size | threads | test | t/s |
| -----------------| ---------: | ------: | -----: | ------------: |
| llama 7B IQ3_S | 2.75 GiB | 16 | pp512 | 100.91 ± 0.26 |
| llama 7B IQ3_XXS | 2.41 GiB | 16 | pp512 | 106.08 ± 0.78 |
| llama 7B IQ2_M | 2.20 GiB | 16 | pp512 | 116.41 ± 0.25 |
| llama 7B IQ2_XS | 1.89 GiB | 16 | pp512 | 132.54 ± 1.07 |
| llama 7B IQ2_XXS | 1.73 GiB | 16 | pp512 | 125.53 ± 0.06 |
arithmetic mean: 116.29
geometric mean: 115.70
| -----------------| ---------: | ------: | -----: | ------------: |
| llama 7B IQ3_S | 2.75 GiB | 8 | tg128 | 15.69 ± 0.04 |
| llama 7B IQ3_XXS | 2.41 GiB | 8 | tg128 | 18.02 ± 0.04 |
| llama 7B IQ2_M | 2.20 GiB | 8 | tg128 | 18.94 ± 0.03 |
| llama 7B IQ2_XS | 1.89 GiB | 8 | tg128 | 23.29 ± 0.02 |
| llama 7B IQ2_XXS | 1.73 GiB | 8 | tg128 | 22.96 ± 0.09 |
arithmetic mean: 19.78
geometric mean: 19.56
Without force-inlining, PP(iq3_xxs) drops to 98 t/s while
PP(iq2_xxs) increases to 137 t/s.
Diffstat (limited to 'sgemm.cpp')
0 files changed, 0 insertions, 0 deletions