diff options
| author | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-20 18:39:31 +0300 | 
|---|---|---|
| committer | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-22 12:02:52 +0300 | 
| commit | f0325c5826c55bb9796485d49bc971a17735e96a (patch) | |
| tree | e70069ee59e64f3882468cc65f09831ae266d744 /examples/rpc | |
| parent | e05cca9ef652eee7b42927485a3821b14e3c565f (diff) | |
bitnet(scale in a separate tensor): more CPU improvements
It seems it is enough to have 4 scales per row for Q8.
I get PPL = 8.5470 with this, which is slightly higher than
the 8.5430 we get with 1 scale per 128 activations, but still
OK, I think.
With this, we get the following performance:
Systema  | quant  |  PP-512     |  TG-128a     | quant |    PP-512    |   TG-12s   |
M2 Max   | iq2bn  229.02 ± 0.37  78.75 ± 0.61  | iq1bn | 146.67 ± 2.85  33.12 ± 0.03
Ryzen7950| iq2bn  379.36 ± 1.03  49.08 ± 0.18  | iq1bn | 247.12 ± 1.53  32.80 ± 0.02
Ryzen5975| iq2bn  465.28 ± 0.57  39.17 ± 0.02  | iq1bn | 325.86 ± 0.46  26.60 ± 0.10
Diffstat (limited to 'examples/rpc')
0 files changed, 0 insertions, 0 deletions
