ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2024-08-19 13:36:51 +0300
committer	GitHub <noreply@github.com>	2024-08-19 13:36:51 +0300
commit	c7b47fc67f23d1296b5b803337c27d8534373161 (patch)
tree	bc846d25dace4d036ad0d19374fcbd8c67ca0c5a /examples/quantize-stats/quantize-stats.cpp
parent	6c5384f20e8657a23aa9d4e0e9856d3d7563a12a (diff)

iq2_k: slightly better bpw - accuracy compromise (#20)

For LLaMA-3.1 models: * It is better to quantize all of attn_v with iq3_k instead of half of attn_v with iq4_k * Quantizing attn_output with iq3_k results in a larger PPL decrease compared to what one expects from the added bpw. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'examples/quantize-stats/quantize-stats.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: