summaryrefslogtreecommitdiff
path: root/tests/get-model.cpp
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-20 18:39:31 +0300
committerIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-22 12:02:52 +0300
commitf0325c5826c55bb9796485d49bc971a17735e96a (patch)
treee70069ee59e64f3882468cc65f09831ae266d744 /tests/get-model.cpp
parente05cca9ef652eee7b42927485a3821b14e3c565f (diff)
bitnet(scale in a separate tensor): more CPU improvements
It seems it is enough to have 4 scales per row for Q8. I get PPL = 8.5470 with this, which is slightly higher than the 8.5430 we get with 1 scale per 128 activations, but still OK, I think. With this, we get the following performance: Systema | quant | PP-512 | TG-128a | quant | PP-512 | TG-12s | M2 Max | iq2bn 229.02 ± 0.37 78.75 ± 0.61 | iq1bn | 146.67 ± 2.85 33.12 ± 0.03 Ryzen7950| iq2bn 379.36 ± 1.03 49.08 ± 0.18 | iq1bn | 247.12 ± 1.53 32.80 ± 0.02 Ryzen5975| iq2bn 465.28 ± 0.57 39.17 ± 0.02 | iq1bn | 325.86 ± 0.46 26.60 ± 0.10
Diffstat (limited to 'tests/get-model.cpp')
0 files changed, 0 insertions, 0 deletions