summaryrefslogtreecommitdiff
path: root/llama.h
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2024-01-11 20:39:39 +0100
committerGitHub <noreply@github.com>2024-01-11 21:39:39 +0200
commit49662cbed3e95f5976c070b85b9fd53fd577038d (patch)
treeb70cd0956715bc11696f6e47d26788e24c5112c4 /llama.h
parent3ba5b8ca8e6181a5c712c5b77595a29f1d3e2b97 (diff)
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
* iq2_xs: basics * iq2_xs: this should have been in the basics * iq2_xs: CUDA and scalar CPU works * iq2_xs: WIP Metal * iq2_xs: Metal now works * iq2_xs: working, but dog slow, ARM_NEON dot product * iq2_xs: better ARM_NEON dot product We are now at 19.5 t/s for TG-128 and 61 t/s for PP-512 when running on the CPU. * iq2_xs: AVX2 dot product - 19.5 t/s * iq2_xs: faster AVX2 dit product 21.4 t/s for TG-128, 59.2 t/s for PP-512. The latter is 2x compared to the previous version. * iq2_xs: had forgotten to delete iq2-data.h * Add llama enum for IQ2_XS --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'llama.h')
-rw-r--r--llama.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/llama.h b/llama.h
index c11075bb..6fde113f 100644
--- a/llama.h
+++ b/llama.h
@@ -104,6 +104,7 @@ extern "C" {
LLAMA_FTYPE_MOSTLY_Q5_K_M = 17, // except 1d tensors
LLAMA_FTYPE_MOSTLY_Q6_K = 18, // except 1d tensors
LLAMA_FTYPE_MOSTLY_IQ2_XXS = 19, // except 1d tensors
+ LLAMA_FTYPE_MOSTLY_IQ2_XS = 20, // except 1d tensors
LLAMA_FTYPE_GUESSED = 1024, // not specified in the model file
};