summaryrefslogtreecommitdiff
path: root/examples/quantize-stats/quantize-stats.cpp
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2024-08-19 13:36:51 +0300
committerGitHub <noreply@github.com>2024-08-19 13:36:51 +0300
commitc7b47fc67f23d1296b5b803337c27d8534373161 (patch)
treebc846d25dace4d036ad0d19374fcbd8c67ca0c5a /examples/quantize-stats/quantize-stats.cpp
parent6c5384f20e8657a23aa9d4e0e9856d3d7563a12a (diff)
iq2_k: slightly better bpw - accuracy compromise (#20)
For LLaMA-3.1 models: * It is better to quantize all of attn_v with iq3_k instead of half of attn_v with iq4_k * Quantizing attn_output with iq3_k results in a larger PPL decrease compared to what one expects from the added bpw. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/quantize-stats/quantize-stats.cpp')
0 files changed, 0 insertions, 0 deletions