diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-08-19 13:36:51 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-08-19 13:36:51 +0300 |
commit | c7b47fc67f23d1296b5b803337c27d8534373161 (patch) | |
tree | bc846d25dace4d036ad0d19374fcbd8c67ca0c5a /examples/quantize-stats/quantize-stats.cpp | |
parent | 6c5384f20e8657a23aa9d4e0e9856d3d7563a12a (diff) |
iq2_k: slightly better bpw - accuracy compromise (#20)
For LLaMA-3.1 models:
* It is better to quantize all of attn_v with iq3_k instead of
half of attn_v with iq4_k
* Quantizing attn_output with iq3_k results in a larger PPL decrease
compared to what one expects from the added bpw.
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/quantize-stats/quantize-stats.cpp')
0 files changed, 0 insertions, 0 deletions