ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2024-02-05 10:46:06 +0200
committer	GitHub <noreply@github.com>	2024-02-05 10:46:06 +0200
commit	6fdfa2ecc684000a25a4ad91823bc82a6652b645 (patch)
tree	c98969391003efff3b83b4ede0a50759b80fa3ab /gguf-py/scripts/__init__.py
parent	a2d60c9158435ae9a6f14632f07f1acf7a3becef (diff)

iq2_xxs: tune quantization (#5320)

We get slightly better PPL, and we cut quantization time in nearly half. The trick is to 1st quantize without forcing points onto the E8-lattice. We can then use a narrower search range around the block scale that we got that way. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'gguf-py/scripts/__init__.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: