diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-01-11 20:39:39 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-01-11 21:39:39 +0200 |
commit | 49662cbed3e95f5976c070b85b9fd53fd577038d (patch) | |
tree | b70cd0956715bc11696f6e47d26788e24c5112c4 /tests/test-quantize-fns.cpp | |
parent | 3ba5b8ca8e6181a5c712c5b77595a29f1d3e2b97 (diff) |
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
* iq2_xs: basics
* iq2_xs: this should have been in the basics
* iq2_xs: CUDA and scalar CPU works
* iq2_xs: WIP Metal
* iq2_xs: Metal now works
* iq2_xs: working, but dog slow, ARM_NEON dot product
* iq2_xs: better ARM_NEON dot product
We are now at 19.5 t/s for TG-128 and 61 t/s for PP-512 when
running on the CPU.
* iq2_xs: AVX2 dot product - 19.5 t/s
* iq2_xs: faster AVX2 dit product
21.4 t/s for TG-128, 59.2 t/s for PP-512.
The latter is 2x compared to the previous version.
* iq2_xs: had forgotten to delete iq2-data.h
* Add llama enum for IQ2_XS
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'tests/test-quantize-fns.cpp')
-rw-r--r-- | tests/test-quantize-fns.cpp | 5 |
1 files changed, 3 insertions, 2 deletions
diff --git a/tests/test-quantize-fns.cpp b/tests/test-quantize-fns.cpp index cee71261..31a78c63 100644 --- a/tests/test-quantize-fns.cpp +++ b/tests/test-quantize-fns.cpp @@ -134,8 +134,9 @@ int main(int argc, char * argv[]) { continue; } - if ((ggml_type)i == GGML_TYPE_IQ2_XXS) { - printf("Skip %s due to missing quantization functionality\n", ggml_type_name((ggml_type) i)); + const ggml_type ei = (ggml_type)i; + if (ei == GGML_TYPE_IQ2_XXS || ei == GGML_TYPE_IQ2_XS) { + printf("Skip %s due to missing quantization functionality\n", ggml_type_name(ei)); continue; } |