1.5 bit quantization (#5453)

* iq1_s: WIP basics * iq1_s: CUDA is working * iq1_s: scalar CPU dot product * iq1_s: WIP AVX2 dot product - something is not right * Fix tests * Fix shadow warnings * Fix after merge with latest master * iq1_s: AVX2 finally works * iq1_s: ARM_NEON dot product. Works, but not very fast * iq1_s: better grid * iq1_s: use IQ2_XXS for attn_output At a cost of 0.04 extra bpw this gives a big improvement in PPL. * iq1_s: Metal basics Dequantize works, but not dot product * iq1_s: Metal works, but quite slow As usual, Apple Silicon does not like the code I write. * iq1_s: Tests * iq1_s: slightly faster dot product --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <48489457+ikawrakow@users.noreply.github.com> 2024-02-18 18:16:55 +0200
committer: GitHub <noreply@github.com> 2024-02-18 18:16:55 +0200
commit: bd2d4e393b2b7d2a1b2e201058e26017c9728ead (patch)
tree: 5c51109459cf1a25fc92fdb11d420895e16785ac /ggml-backend.c
parent: c8e0d7efeb7634ecc2e9832e879ab9fca4510e71 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/ggml-backend.c b/ggml-backend.c
index 66e8c293..5076d9e5 100644
--- a/ggml-backend.c
+++ b/ggml-backend.c
@@ -756,7 +756,7 @@ GGML_CALL static bool ggml_backend_cpu_graph_compute(ggml_backend_t backend, str
 GGML_CALL static bool ggml_backend_cpu_supports_op(ggml_backend_t backend, const struct ggml_tensor * op) {
     switch (op->op) {
         case GGML_OP_CPY:
-            return op->type != GGML_TYPE_IQ2_XXS && op->type != GGML_TYPE_IQ2_XS; // missing type_traits.from_float
+            return op->type != GGML_TYPE_IQ2_XXS && op->type != GGML_TYPE_IQ2_XS && op->type != GGML_TYPE_IQ1_S; // missing type_traits.from_float
         case GGML_OP_MUL_MAT:
             return op->src[1]->type == GGML_TYPE_F32 || op->src[1]->type == ggml_internal_get_type_traits(op->src[0]->type).vec_dot_type;
         default:
author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2024-02-18 18:16:55 +0200
committer	GitHub <noreply@github.com>	2024-02-18 18:16:55 +0200
commit	bd2d4e393b2b7d2a1b2e201058e26017c9728ead (patch)
tree	5c51109459cf1a25fc92fdb11d420895e16785ac /ggml-backend.c
parent	c8e0d7efeb7634ecc2e9832e879ab9fca4510e71 (diff)