diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-02-18 18:16:55 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-02-18 18:16:55 +0200 |
commit | bd2d4e393b2b7d2a1b2e201058e26017c9728ead (patch) | |
tree | 5c51109459cf1a25fc92fdb11d420895e16785ac /ggml.h | |
parent | c8e0d7efeb7634ecc2e9832e879ab9fca4510e71 (diff) |
1.5 bit quantization (#5453)
* iq1_s: WIP basics
* iq1_s: CUDA is working
* iq1_s: scalar CPU dot product
* iq1_s: WIP AVX2 dot product - something is not right
* Fix tests
* Fix shadow warnings
* Fix after merge with latest master
* iq1_s: AVX2 finally works
* iq1_s: ARM_NEON dot product. Works, but not very fast
* iq1_s: better grid
* iq1_s: use IQ2_XXS for attn_output
At a cost of 0.04 extra bpw this gives a big improvement in PPL.
* iq1_s: Metal basics
Dequantize works, but not dot product
* iq1_s: Metal works, but quite slow
As usual, Apple Silicon does not like the code I write.
* iq1_s: Tests
* iq1_s: slightly faster dot product
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml.h')
-rw-r--r-- | ggml.h | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -354,6 +354,7 @@ extern "C" { GGML_TYPE_IQ2_XXS = 16, GGML_TYPE_IQ2_XS = 17, GGML_TYPE_IQ3_XXS = 18, + GGML_TYPE_IQ1_S = 19, GGML_TYPE_I8, GGML_TYPE_I16, GGML_TYPE_I32, @@ -391,6 +392,7 @@ extern "C" { GGML_FTYPE_MOSTLY_IQ2_XXS = 15, // except 1d tensors GGML_FTYPE_MOSTLY_IQ2_XS = 16, // except 1d tensors GGML_FTYPE_MOSTLY_IQ3_XXS = 17, // except 1d tensors + GGML_FTYPE_MOSTLY_IQ1_S = 18, // except 1d tensors }; // available tensor operations: |