diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-06-01 15:23:44 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-06-01 15:23:44 +0300 |
commit | 35374bc7e8de2b221ed4eabe426e05d8b9a7f99b (patch) | |
tree | f6e8438421d7c0a5971be7259a1581d116e47e79 /ggml/src/ggml-cann.cpp | |
parent | 7239ce6b35f0a1812bb54393f6a237c4f7cfe713 (diff) |
Metal implementatio for the trellis quants. (#475)
* iq2_kt: Metal dequantize
* iq2_kt: Metal GEMV
Performance is actually quite decent: 52 t/s on my M2-Max for LlaMA-3.1-8B
* iq3_kt: Metal dequantize
* iq3_kt: Metal GEMV
Performance is not as good as iq2_kt: 40 t/s on my M2-Max for LlaMA-3.1-8B.
Flipping signs is a costly affair.
* iq4_kt: Metal dequantize - getting NaNs
* iq4_kt: Metal GEMV - also not working
* iq4_kt: Metal still not working
* Disable iq4_kt on Metal for now
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-cann.cpp')
0 files changed, 0 insertions, 0 deletions