diff options
| author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-02-05 10:46:06 +0200 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-02-05 10:46:06 +0200 |
| commit | 6fdfa2ecc684000a25a4ad91823bc82a6652b645 (patch) | |
| tree | c98969391003efff3b83b4ede0a50759b80fa3ab /gguf-py/scripts/__init__.py | |
| parent | a2d60c9158435ae9a6f14632f07f1acf7a3becef (diff) | |
iq2_xxs: tune quantization (#5320)
We get slightly better PPL, and we cut quantization time in
nearly half.
The trick is to 1st quantize without forcing points onto the E8-lattice.
We can then use a narrower search range around the block scale that we
got that way.
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'gguf-py/scripts/__init__.py')
0 files changed, 0 insertions, 0 deletions
