diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-03-21 08:27:57 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-03-21 08:27:57 +0100 |
commit | 76aa30a26353f597e4fbe3cf776772ae812af89a (patch) | |
tree | 35654d27aa0f3fd656aa5cab1125999c13ae5201 /examples | |
parent | c5b8595e3f4f4ed319ef71c9c9d868d1b7a27626 (diff) |
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
* k_cache: be able to use Q5_0
* k_cache: be able to use Q5_1 on CODA
* k_cache: be able to use Q5_0 on Metal
* k_cache: be able to use Q5_1 on Metal
* k_cache: be able to use IQ4_NL - just CUDA for now
* k_cache: be able to use IQ4_NL on Metal
* k_cache: add newly added supported types to llama-bench and CUDA supports_op
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
-rw-r--r-- | examples/llama-bench/llama-bench.cpp | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/examples/llama-bench/llama-bench.cpp b/examples/llama-bench/llama-bench.cpp index 4cb23080..82413b79 100644 --- a/examples/llama-bench/llama-bench.cpp +++ b/examples/llama-bench/llama-bench.cpp @@ -249,6 +249,9 @@ static ggml_type ggml_type_from_name(const std::string & s) { if (s == "q5_1") { return GGML_TYPE_Q5_1; } + if (s == "iq4_nl") { + return GGML_TYPE_IQ4_NL; + } return GGML_TYPE_COUNT; } |