Convert models to row-interleaved quants using the quantize tool (#272)

* Repack a model with the quantize tool * WIP * Fixed various issues As we don't have a way to tell if a repacked quant has been modified, I had to remove the modification at the expense of a slight decrease in performance. This affects q8_0_r8, q8_KV_r8, q8_k_r8 on Zen4, and q4_0_r8 on ARM. * Create wk_b and wv_b as Q8_0_R8 if the wkv_b type is interleaved * Fix GCC 13.3 compilation error * Another one * Add missing include --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <iwankawrakow@gmail.com> 2025-03-21 07:23:36 +0100
committer: GitHub <noreply@github.com> 2025-03-21 07:23:36 +0100
commit: b8d1fac97b756968b86b470d44bb1026ded7157a (patch)
tree: 5a5893796293475185e833a787648830a7189450 /ggml/src/iqk/iqk_quantize.h
parent: 127c6ee6493a3084995d754d987f0240ffdffe6a (diff)
1 files changed, 3 insertions, 0 deletions
diff --git a/ggml/src/iqk/iqk_quantize.h b/ggml/src/iqk/iqk_quantize.h
index d447705b..dd148f2e 100644
--- a/ggml/src/iqk/iqk_quantize.h
+++ b/ggml/src/iqk/iqk_quantize.h
@@ -245,6 +245,9 @@ void repack_bf16_bf16_r16(const void * GGML_RESTRICT src, void * GGML_RESTRICT d
 void iqk_repack_tensor(struct ggml_tensor * tensor);
 bool iqk_modify_tensor(struct ggml_tensor * tensor);
 
+int iqk_repacked_type(const struct ggml_tensor * tensor); // int instead of ggml_type so we don't need to include ggml.h
+bool iqk_should_modify_tensor(const struct ggml_tensor * tensor);
+
 // So we can re-pack Microsoft's BitNet I2_S quants
 void dequantize_row_ms_i2s(const void * GGML_RESTRICT x, float * GGML_RESTRICT y, int64_t k);
author	Kawrakow <iwankawrakow@gmail.com>	2025-03-21 07:23:36 +0100
committer	GitHub <noreply@github.com>	2025-03-21 07:23:36 +0100
commit	b8d1fac97b756968b86b470d44bb1026ded7157a (patch)
tree	5a5893796293475185e833a787648830a7189450 /ggml/src/iqk/iqk_quantize.h
parent	127c6ee6493a3084995d754d987f0240ffdffe6a (diff)