diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-01-30 15:14:12 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-01-30 15:14:12 +0200 |
commit | f4d7e5497485ce6ce0e322533930b7da4657dd2d (patch) | |
tree | 78b30048cb4a9c78d5cf3e231a1ac3e9ed190577 /examples/quantize-stats/quantize-stats.cpp | |
parent | 2256f36b79a932a478d4dcdf02c1e5a60056e5f3 (diff) |
SOTA 3-bit quants (#5196)
* iq3_xxs: quantize/dequantize
RMSE seems a bit high-ish at about half-way between q2_K and
q3_K, so need to check more.
* iq3_xxs: CUDA dequantize works
* iq2_xxs: tuning quantization
* iq3_xxs: starting to look better
PPL on wiki.test.raw
LLaMA-v1-7B: 6.4218
LLaMA-v2-7B: 6.3560
Mistral-7B : 6.0717
This is better than Q3_K_XS, with a 5% reduction in quantized model
size.
* iq3_xxs: CUDA dot product
We have
PP-512: 5891 t/s
TG-128: 143.9 t/s
* iq3_xxs: scalar and AVX2 dot products
* iq3_xxs: ARM_NEON and Metal
Metal performance is decent, ARM_NEON is pathetic
* iq3_xxs: slightly better grid points
* Faster iq3_xxs and iq2_xs dot products on CUDA
* iq3_xxs: add some quant mix
* iq3_xxs: fix failing quantization test
Dot product still fails. Is this real?
* iq3_xxs: hopefully fix ROCm
* iq3_xxs: failing tests
This time the dot product accuracy did find an actual bug
in the AVX2 implementation.
* Add IQ3_XXS to test-backend-ops
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/quantize-stats/quantize-stats.cpp')
-rw-r--r-- | examples/quantize-stats/quantize-stats.cpp | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/examples/quantize-stats/quantize-stats.cpp b/examples/quantize-stats/quantize-stats.cpp index 77302416..6d5f213d 100644 --- a/examples/quantize-stats/quantize-stats.cpp +++ b/examples/quantize-stats/quantize-stats.cpp @@ -378,6 +378,8 @@ int main(int argc, char ** argv) { printf("testing %s ...\n", ggml_type_name(type)); } + ggml_quantize_init(type); + error_stats global_stats {}; for (const auto& kv_tensor : tensors) { |