diff options
author | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-20 18:39:31 +0300 |
---|---|---|
committer | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-22 12:02:52 +0300 |
commit | f0325c5826c55bb9796485d49bc971a17735e96a (patch) | |
tree | e70069ee59e64f3882468cc65f09831ae266d744 /tests/test-double-float.cpp | |
parent | e05cca9ef652eee7b42927485a3821b14e3c565f (diff) |
bitnet(scale in a separate tensor): more CPU improvements
It seems it is enough to have 4 scales per row for Q8.
I get PPL = 8.5470 with this, which is slightly higher than
the 8.5430 we get with 1 scale per 128 activations, but still
OK, I think.
With this, we get the following performance:
Systema | quant | PP-512 | TG-128a | quant | PP-512 | TG-12s |
M2 Max | iq2bn 229.02 ± 0.37 78.75 ± 0.61 | iq1bn | 146.67 ± 2.85 33.12 ± 0.03
Ryzen7950| iq2bn 379.36 ± 1.03 49.08 ± 0.18 | iq1bn | 247.12 ± 1.53 32.80 ± 0.02
Ryzen5975| iq2bn 465.28 ± 0.57 39.17 ± 0.02 | iq1bn | 325.86 ± 0.46 26.60 ± 0.10
Diffstat (limited to 'tests/test-double-float.cpp')
0 files changed, 0 insertions, 0 deletions