summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md188
1 files changed, 94 insertions, 94 deletions
diff --git a/README.md b/README.md
index 70875dc1..8a394174 100644
--- a/README.md
+++ b/README.md
@@ -13,7 +13,7 @@ If you are not already familiar with [llama.cpp](https://github.com/ggerganov/ll
Note that I have published some, but not all, of the code in this repository in a series of [llamafile](https://github.com/Mozilla-Ocho/llamafile) PRs ([394](https://github.com/Mozilla-Ocho/llamafile/pull/394), [405](https://github.com/Mozilla-Ocho/llamafile/pull/405), [428](https://github.com/Mozilla-Ocho/llamafile/pull/428), [435](https://github.com/Mozilla-Ocho/llamafile/pull/435), [453](https://github.com/Mozilla-Ocho/llamafile/pull/453), and [464](https://github.com/Mozilla-Ocho/llamafile/pull/464))
-The implementation оф матриь мултиплицатионс is in a single C++ source file (`iqk_mul_mat.cpp`) with just two interface functions `iqk_mul_mat` (`fp16/fp32` and quantized matrix multiplications) and `iqk_mul_mat_moe` (as `iqk_mul_mat` but meant to be used for the FFN part of a MoE model). Under the hood `iqk_mul_mat_moe` uses the same implementation as `iqk_mul_mat`, with the only difference being where results are stored in memory. Bitnet quantization related stuff is in `iqk-quantize.cpp`.
+The implementation of matrix-matrix and matrix-vector multiplications is in a single C++ source file (`iqk_mul_mat.cpp`) with just two interface functions `iqk_mul_mat` (`fp16/fp32` and quantized matrix multiplications) and `iqk_mul_mat_moe` (as `iqk_mul_mat` but meant to be used for the FFN part of a MoE model). Under the hood `iqk_mul_mat_moe` uses the same implementation as `iqk_mul_mat`, with the only difference being where results are stored in memory. Bitnet quantization related stuff is in `iqk-quantize.cpp`.
## Why?
@@ -92,99 +92,99 @@ The command line to generate the data was
./bin/llama-bench -m $model -p 0 -n 128 -t $num_threads -ngl 0
```
-| Quantization | size | backend | threads | t/s (llama.cpp) | t/s (iqk_mul_mat)| Speedup |
-| ------------------------ | ---------: | ---------- | ------: | ---------------: | ---------------: | ------: |
-| 8B F16 | 14.96 GiB | AVX2 | 1 | 2.20 ± 0.00 | 2.25 ± 0.00 | 1.023 |
-| | | | 2 | 3.63 ± 0.00 | 3.68 ± 0.00 | 1.014 |
-| | | | 4 | 4.20 ± 0.00 | 4.20 ± 0.00 | 1.000 |
-| 7B F16 | 12.55 GiB | NEON | 2 | 6.94 ± 0.27 | 7.40 ± 0.01 | 1.066 |
-| | | | 4 | 8.73 ± 0.01 | 8.83 ± 0.01 | 1.011 |
-| | | | 6 | 9.05 ± 0.02 | 9.05 ± 0.01 | 1.000 |
-| 8B Q8_0 | 7.95 GiB | AVX2 | 2 | 5.03 ± 0.00 | 7.87 ± 0.00 | 1.565 |
-| | | | 4 | 7.40 ± 0.00 | 7.82 ± 0.00 | 1.057 |
-| 7B Q8_0 | 6.67 GiB | NEON | 2 | 8.29 ± 0.44 | 12.07 ± 0.10 | 1.456 |
-| | | | 4 | 13.53 ± 0.03 | 15.77 ± 0.08 | 1.166 |
-| | | | 8 | 16.24 ± 0.10 | 16.94 ± 0.04 | 1.043 |
-| 8B Q4_0 | 4.35 GiB | AVX2 | 2 | 6.36 ± 0.00 | 10.28 ± 0.00 | 1.616 |
-| | | | 4 | 10.97 ± 0.06 | 13.55 ± 0.07 | 1.235 |
-| 7B Q4_0 | 3.57 GiB | NEON | 2 | 9.77 ± 0.02 | 13.69 ± 0.03 | 1.401 |
-| | | | 4 | 17.82 ± 0.06 | 23.98 ± 0.11 | 1.346 |
-| | | | 8 | 26.63 ± 0.41 | 29.86 ± 0.04 | 1.121 |
-| 8B Q4_1 | 4.77 GiB | AVX2 | 2 | 5.11 ± 0.00 | 11.45 ± 0.00 | 2.241 |
-| | | | 4 | 9.08 ± 0.02 | 12.58 ± 0.00 | 1.385 |
-| 7B Q4_1 | 3.95 GiB | NEON | 2 | 9.11 ± 0.06 | 14.62 ± 0.04 | 1.605 |
-| | | | 4 | 17.04 ± 0.09 | 24.08 ± 0.28 | 1.413 |
-| | | | 8 | 25.26 ± 0.24 | 27.23 ± 0.14 | 1.078 |
-| 8B Q5_0 | 5.22 GiB | AVX2 | 2 | 5.31 ± 0.01 | 8.30 ± 0.01 | 1.563 |
-| | | | 4 | 9.40 ± 0.01 | 11.47 ± 0.00 | 1.220 |
-| 7B Q5_0 | 4.34 GiB | NEON | 2 | 7.26 ± 0.06 | 7.52 ± 0.00 | 1.036 |
-| | | | 4 | 13.63 ± 0.18 | 14.16 ± 0.10 | 1.039 |
-| | | | 8 | 22.55 ± 0.35 | 24.34 ± 0.22 | 1.079 |
-| 8B Q5_1 | 5.64 GiB | AVX2 | 2 | 4.52 ± 0.00 | 8.86 ± 0.00 | 1.960 |
-| | | | 4 | 7.72 ± 0.05 | 10.68 ± 0.03 | 1.383 |
-| 7B Q5_1 | 4.72 GiB | NEON | 2 | 6.51 ± 0.01 | 6.42 ± 0.03 | 0.986 |
-| | | | 4 | 12.26 ± 0.18 | 12.21 ± 0.14 | 0.996 |
-| | | | 8 | 20.33 ± 0.52 | 21.85 ± 0.22 | 1.075 |
-| 8B Q2_K - Small | 2.78 GiB | AVX2 | 2 | 11.30 ± 0.00 | 13.06 ± 0.01 | 1.156 |
-| | | | 4 | 18.70 ± 0.00 | 19.04 ± 0.65 | 1.014 |
-| 7B Q2_K - Small | 2.16 GiB | NEON | 2 | 8.42 ± 0.05 | 11.97 ± 0.10 | 1.422 |
-| | | | 4 | 15.74 ± 0.01 | 22.09 ± 0.08 | 1.403 |
-| | | | 8 | 27.35 ± 0.05 | 38.32 ± 0.05 | 1.401 |
-| 8B Q3_K - Small | 3.41 GiB | AVX2 | 2 | 8.58 ± 0.00 | 10.82 ± 0.00 | 1.261 |
-| | | | 4 | 15.26 ± 0.01 | 16.25 ± 0.01 | 1.065 |
-| 7B Q3_K - Small | 2.75 GiB | NEON | 2 | 6.40 ± 0.02 | 9.12 ± 0.09 | 1.425 |
-| | | | 4 | 12.17 ± 0.00 | 17.11 ± 0.03 | 1.406 |
-| | | | 8 | 22.04 ± 0.08 | 31.39 ± 0.31 | 1.424 |
-| 8B Q4_K - Small | 4.36 GiB | AVX2 | 2 | 9.61 ± 0.00 | 10.72 ± 0.01 | 1.116 |
-| | | | 4 | 13.24 ± 0.31 | 13.28 ± 0.01 | 1.003 |
-| 7B Q4_K - Small | 3.59 GiB | NEON | 2 | 11.15 ± 0.05 | 12.93 ± 0.09 | 1.160 |
-| | | | 4 | 20.24 ± 0.16 | 23.49 ± 0.29 | 1.161 |
-| | | | 8 | 25.76 ± 0.07 | 28.31 ± 0.22 | 1.099 |
-| 8B Q5_K - Small | 5.21 GiB | AVX2 | 2 | 7.45 ± 0.00 | 9.73 ± 0.00 | 1.306 |
-| | | | 4 | 11.05 ± 0.33 | 11.43 ± 0.02 | 1.034 |
-| 7B Q5_K - Small | 4.33 GiB | NEON | 2 | 7.20 ± 0.04 | 8.81 ± 0.04 | 1.224 |
-| | | | 4 | 13.62 ± 0.15 | 16.81 ± 0.16 | 1.234 |
-| | | | 8 | 20.56 ± 0.19 | 23.96 ± 0.14 | 1.165 |
-| 8B Q6_K | 6.14 GiB | AVX2 | 2 | 7.53 ± 0.00 | 9.42 ± 0.00 | 1.251 |
-| | | | 4 | 9.74 ± 0.00 | 9.97 ± 0.01 | 1.024 |
-| 7B Q6_K | 5.15 GiB | NEON | 2 | 6.85 ± 0.04 | 8.30 ± 0.06 | 1.212 |
-| | | | 4 | 13.03 ± 0.05 | 15.47 ± 0.17 | 1.187 |
-| | | | 8 | 18.52 ± 0.07 | 20.67 ± 0.08 | 1.116 |
-| 8B IQ2_XXS - 2.0625 bpw | 2.23 GiB | AVX2 | 2 | 5.33 ± 0.01 | 6.40 ± 0.00 | 1.201 |
-| | | | 4 | 10.06 ± 0.03 | 11.76 ± 0.03 | 1.169 |
-| 7B IQ2_XXS - 2.0625 bpw | 1.73 GiB | NEON | 2 | 5.07 ± 0.04 | 5.22 ± 0.05 | 1.030 |
-| | | | 4 | 9.63 ± 0.00 | 9.91 ± 0.07 | 1.029 |
-| | | | 8 | 17.40 ± 0.50 | 18.65 ± 0.22 | 1.072 |
-| 8B IQ2_XS - 2.3125 bpw | 2.42 GiB | AVX2 | 2 | 5.83 ± 0.00 | 6.55 ± 0.00 | 1.123 |
-| | | | 4 | 10.88 ± 0.09 | 12.07 ± 0.07 | 1.109 |
-| 7B IQ2_XS - 2.3125 bpw | 1.89 GiB | NEON | 2 | 5.52 ± 0.01 | 5.60 ± 0.00 | 1.014 |
-| | | | 4 | 10.50 ± 0.01 | 11.15 ± 0.00 | 1.062 |
-| | | | 8 | 18.19 ± 1.30 | 20.94 ± 0.19 | 1.151 |
-| 8B IQ2_M - 2.7 bpw | 2.74 GiB | AVX2 | 2 | 5.12 ± 0.01 | 5.17 ± 0.00 | 1.010 |
-| | | | 4 | 9.60 ± 0.28 | 9.68 ± 0.16 | 1.008 |
-| 7B IQ2_M - 2.7 bpw | 2.20 GiB | NEON | 2 | 3.73 ± 0.02 | 4.53 ± 0.00 | 1.214 |
-| | | | 4 | 7.14 ± 0.05 | 8.70 ± 0.06 | 1.218 |
-| | | | 8 | 11.99 ± 0.48 | 16.41 ± 0.05 | 1.369 |
-| 8B IQ3_XXS - 3.0625 bpw | 3.04 GiB | AVX2 | 2 | 4.06 ± 0.01 | 5.00 ± 0.00 | 1.232 |
-| | | | 4 | 7.75 ± 0.02 | 9.13 ± 0.45 | 1.178 |
-| 7B IQ3_XXS - 3.0625 bpw | 2.41 GiB | NEON | 2 | 3.53 ± 0.00 | 3.82 ± 0.00 | 1.082 |
-| | | | 4 | 6.74 ± 0.04 | 7.42 ± 0.07 | 1.103 |
-| | | | 8 | 11.96 ± 0.40 | 13.19 ± 0.29 | 1.103 |
-| 8B IQ3_S - 3.4375 bpw | 3.42 GiB | AVX2 | 2 | 3.62 ± 0.00 | 4.06 ± 0.00 | 1.122 |
-| | | | 4 | 6.80 ± 0.01 | 7.62 ± 0.10 | 1.121 |
-| 7B IQ3_S - 3.4375 bpw | 2.75 GiB | NEON | 2 | 2.96 ± 0.01 | 3.21 ± 0.03 | 1.084 |
-| | | | 4 | 5.68 ± 0.01 | 6.25 ± 0.05 | 1.100 |
-| | | | 8 | 10.32 ± 0.25 | 11.11 ± 0.37 | 1.077 |
-| 8B IQ4_XS - 4.25 bpw | 4.13 GiB | AVX2 | 2 | 8.08 ± 0.00 | 11.35 ± 0.00 | 1.405 |
-| | | | 4 | 13.36 ± 0.72 | 14.32 ± 0.24 | 1.072 |
-| 7B IQ4_XS - 4.25 bpw | 3.37 GiB | NEON | 2 | 9.87 ± 0.03 | 12.06 ± 0.00 | 1.222 |
-| | | | 4 | 17.78 ± 0.23 | 22.06 ± 0.28 | 1.241 |
-| | | | 8 | 27.62 ± 0.09 | 29.70 ± 0.39 | 1.075 |
-| 8B IQ4_NL - 4.5 bpw | 4.35 GiB | AVX2 | 2 | 5.52 ± 0.00 | 10.26 ± 0.00 | 1.859 |
-| | | | 4 | 10.78 ± 0.01 | 13.69 ± 0.08 | 1.270 |
-| 7B IQ4_NL - 4.5 bpw | 3.56 GiB | NEON | 2 | 8.32 ± 0.01 | 13.54 ± 0.01 | 1.627 |
-| | | | 4 | 15.89 ± 0.00 | 24.28 ± 0.29 | 1.528 |
-| | | | 8 | 26.56 ± 0.36 | 29.87 ± 0.08 | 1.125 |
+| Quantization| size | backend | threads | t/s (llama.cpp) | t/s (iqk_mul_mat)| Speedup |
+| ---------- | ---------: | ---------- | ------: | ---------------: | ---------------: | ------: |
+| 8B F16 | 14.96 GiB | AVX2 | 1 | 2.20 ± 0.00 | 2.25 ± 0.00 | 1.023 |
+| | | | 2 | 3.63 ± 0.00 | 3.68 ± 0.00 | 1.014 |
+| | | | 4 | 4.20 ± 0.00 | 4.20 ± 0.00 | 1.000 |
+| 7B F16 | 12.55 GiB | NEON | 2 | 6.94 ± 0.27 | 7.40 ± 0.01 | 1.066 |
+| | | | 4 | 8.73 ± 0.01 | 8.83 ± 0.01 | 1.011 |
+| | | | 6 | 9.05 ± 0.02 | 9.05 ± 0.01 | 1.000 |
+| 8B Q8_0 | 7.95 GiB | AVX2 | 2 | 5.03 ± 0.00 | 7.87 ± 0.00 | 1.565 |
+| | | | 4 | 7.40 ± 0.00 | 7.82 ± 0.00 | 1.057 |
+| 7B Q8_0 | 6.67 GiB | NEON | 2 | 8.29 ± 0.44 | 12.07 ± 0.10 | 1.456 |
+| | | | 4 | 13.53 ± 0.03 | 15.77 ± 0.08 | 1.166 |
+| | | | 8 | 16.24 ± 0.10 | 16.94 ± 0.04 | 1.043 |
+| 8B Q4_0 | 4.35 GiB | AVX2 | 2 | 6.36 ± 0.00 | 10.28 ± 0.00 | 1.616 |
+| | | | 4 | 10.97 ± 0.06 | 13.55 ± 0.07 | 1.235 |
+| 7B Q4_0 | 3.57 GiB | NEON | 2 | 9.77 ± 0.02 | 13.69 ± 0.03 | 1.401 |
+| | | | 4 | 17.82 ± 0.06 | 23.98 ± 0.11 | 1.346 |
+| | | | 8 | 26.63 ± 0.41 | 29.86 ± 0.04 | 1.121 |
+| 8B Q4_1 | 4.77 GiB | AVX2 | 2 | 5.11 ± 0.00 | 11.45 ± 0.00 | 2.241 |
+| | | | 4 | 9.08 ± 0.02 | 12.58 ± 0.00 | 1.385 |
+| 7B Q4_1 | 3.95 GiB | NEON | 2 | 9.11 ± 0.06 | 14.62 ± 0.04 | 1.605 |
+| | | | 4 | 17.04 ± 0.09 | 24.08 ± 0.28 | 1.413 |
+| | | | 8 | 25.26 ± 0.24 | 27.23 ± 0.14 | 1.078 |
+| 8B Q5_0 | 5.22 GiB | AVX2 | 2 | 5.31 ± 0.01 | 8.30 ± 0.01 | 1.563 |
+| | | | 4 | 9.40 ± 0.01 | 11.47 ± 0.00 | 1.220 |
+| 7B Q5_0 | 4.34 GiB | NEON | 2 | 7.26 ± 0.06 | 7.52 ± 0.00 | 1.036 |
+| | | | 4 | 13.63 ± 0.18 | 14.16 ± 0.10 | 1.039 |
+| | | | 8 | 22.55 ± 0.35 | 24.34 ± 0.22 | 1.079 |
+| 8B Q5_1 | 5.64 GiB | AVX2 | 2 | 4.52 ± 0.00 | 8.86 ± 0.00 | 1.960 |
+| | | | 4 | 7.72 ± 0.05 | 10.68 ± 0.03 | 1.383 |
+| 7B Q5_1 | 4.72 GiB | NEON | 2 | 6.51 ± 0.01 | 6.42 ± 0.03 | 0.986 |
+| | | | 4 | 12.26 ± 0.18 | 12.21 ± 0.14 | 0.996 |
+| | | | 8 | 20.33 ± 0.52 | 21.85 ± 0.22 | 1.075 |
+| 8B Q2_K_S | 2.78 GiB | AVX2 | 2 | 11.30 ± 0.00 | 13.06 ± 0.01 | 1.156 |
+| | | | 4 | 18.70 ± 0.00 | 19.04 ± 0.65 | 1.014 |
+| 7B Q2_K_S | 2.16 GiB | NEON | 2 | 8.42 ± 0.05 | 11.97 ± 0.10 | 1.422 |
+| | | | 4 | 15.74 ± 0.01 | 22.09 ± 0.08 | 1.403 |
+| | | | 8 | 27.35 ± 0.05 | 38.32 ± 0.05 | 1.401 |
+| 8B Q3_K_S | 3.41 GiB | AVX2 | 2 | 8.58 ± 0.00 | 10.82 ± 0.00 | 1.261 |
+| | | | 4 | 15.26 ± 0.01 | 16.25 ± 0.01 | 1.065 |
+| 7B Q3_K_S | 2.75 GiB | NEON | 2 | 6.40 ± 0.02 | 9.12 ± 0.09 | 1.425 |
+| | | | 4 | 12.17 ± 0.00 | 17.11 ± 0.03 | 1.406 |
+| | | | 8 | 22.04 ± 0.08 | 31.39 ± 0.31 | 1.424 |
+| 8B Q4_K_S | 4.36 GiB | AVX2 | 2 | 9.61 ± 0.00 | 10.72 ± 0.01 | 1.116 |
+| | | | 4 | 13.24 ± 0.31 | 13.28 ± 0.01 | 1.003 |
+| 7B Q4_K_S | 3.59 GiB | NEON | 2 | 11.15 ± 0.05 | 12.93 ± 0.09 | 1.160 |
+| | | | 4 | 20.24 ± 0.16 | 23.49 ± 0.29 | 1.161 |
+| | | | 8 | 25.76 ± 0.07 | 28.31 ± 0.22 | 1.099 |
+| 8B Q5_K_S | 5.21 GiB | AVX2 | 2 | 7.45 ± 0.00 | 9.73 ± 0.00 | 1.306 |
+| | | | 4 | 11.05 ± 0.33 | 11.43 ± 0.02 | 1.034 |
+| 7B Q5_K_S | 4.33 GiB | NEON | 2 | 7.20 ± 0.04 | 8.81 ± 0.04 | 1.224 |
+| | | | 4 | 13.62 ± 0.15 | 16.81 ± 0.16 | 1.234 |
+| | | | 8 | 20.56 ± 0.19 | 23.96 ± 0.14 | 1.165 |
+| 8B Q6_K | 6.14 GiB | AVX2 | 2 | 7.53 ± 0.00 | 9.42 ± 0.00 | 1.251 |
+| | | | 4 | 9.74 ± 0.00 | 9.97 ± 0.01 | 1.024 |
+| 7B Q6_K | 5.15 GiB | NEON | 2 | 6.85 ± 0.04 | 8.30 ± 0.06 | 1.212 |
+| | | | 4 | 13.03 ± 0.05 | 15.47 ± 0.17 | 1.187 |
+| | | | 8 | 18.52 ± 0.07 | 20.67 ± 0.08 | 1.116 |
+| 8B IQ2_XXS | 2.23 GiB | AVX2 | 2 | 5.33 ± 0.01 | 6.40 ± 0.00 | 1.201 |
+| | | | 4 | 10.06 ± 0.03 | 11.76 ± 0.03 | 1.169 |
+| 7B IQ2_XXS | 1.73 GiB | NEON | 2 | 5.07 ± 0.04 | 5.22 ± 0.05 | 1.030 |
+| | | | 4 | 9.63 ± 0.00 | 9.91 ± 0.07 | 1.029 |
+| | | | 8 | 17.40 ± 0.50 | 18.65 ± 0.22 | 1.072 |
+| 8B IQ2_XS | 2.42 GiB | AVX2 | 2 | 5.83 ± 0.00 | 6.55 ± 0.00 | 1.123 |
+| | | | 4 | 10.88 ± 0.09 | 12.07 ± 0.07 | 1.109 |
+| 7B IQ2_XS | 1.89 GiB | NEON | 2 | 5.52 ± 0.01 | 5.60 ± 0.00 | 1.014 |
+| | | | 4 | 10.50 ± 0.01 | 11.15 ± 0.00 | 1.062 |
+| | | | 8 | 18.19 ± 1.30 | 20.94 ± 0.19 | 1.151 |
+| 8B IQ2_M | 2.74 GiB | AVX2 | 2 | 5.12 ± 0.01 | 5.17 ± 0.00 | 1.010 |
+| | | | 4 | 9.60 ± 0.28 | 9.68 ± 0.16 | 1.008 |
+| 7B IQ2_M | 2.20 GiB | NEON | 2 | 3.73 ± 0.02 | 4.53 ± 0.00 | 1.214 |
+| | | | 4 | 7.14 ± 0.05 | 8.70 ± 0.06 | 1.218 |
+| | | | 8 | 11.99 ± 0.48 | 16.41 ± 0.05 | 1.369 |
+| 8B IQ3_XXS | 3.04 GiB | AVX2 | 2 | 4.06 ± 0.01 | 5.00 ± 0.00 | 1.232 |
+| | | | 4 | 7.75 ± 0.02 | 9.13 ± 0.45 | 1.178 |
+| 7B IQ3_XXS | 2.41 GiB | NEON | 2 | 3.53 ± 0.00 | 3.82 ± 0.00 | 1.082 |
+| | | | 4 | 6.74 ± 0.04 | 7.42 ± 0.07 | 1.103 |
+| | | | 8 | 11.96 ± 0.40 | 13.19 ± 0.29 | 1.103 |
+| 8B IQ3_S | 3.42 GiB | AVX2 | 2 | 3.62 ± 0.00 | 4.06 ± 0.00 | 1.122 |
+| | | | 4 | 6.80 ± 0.01 | 7.62 ± 0.10 | 1.121 |
+| 7B IQ3_S | 2.75 GiB | NEON | 2 | 2.96 ± 0.01 | 3.21 ± 0.03 | 1.084 |
+| | | | 4 | 5.68 ± 0.01 | 6.25 ± 0.05 | 1.100 |
+| | | | 8 | 10.32 ± 0.25 | 11.11 ± 0.37 | 1.077 |
+| 8B IQ4_XS | 4.13 GiB | AVX2 | 2 | 8.08 ± 0.00 | 11.35 ± 0.00 | 1.405 |
+| | | | 4 | 13.36 ± 0.72 | 14.32 ± 0.24 | 1.072 |
+| 7B IQ4_XS | 3.37 GiB | NEON | 2 | 9.87 ± 0.03 | 12.06 ± 0.00 | 1.222 |
+| | | | 4 | 17.78 ± 0.23 | 22.06 ± 0.28 | 1.241 |
+| | | | 8 | 27.62 ± 0.09 | 29.70 ± 0.39 | 1.075 |
+| 8B IQ4_NL | 4.35 GiB | AVX2 | 2 | 5.52 ± 0.00 | 10.26 ± 0.00 | 1.859 |
+| | | | 4 | 10.78 ± 0.01 | 13.69 ± 0.08 | 1.270 |
+| 7B IQ4_NL | 3.56 GiB | NEON | 2 | 8.32 ± 0.01 | 13.54 ± 0.01 | 1.627 |
+| | | | 4 | 15.89 ± 0.00 | 24.28 ± 0.29 | 1.528 |
+| | | | 8 | 26.56 ± 0.36 | 29.87 ± 0.08 | 1.125 |
Here gains are generally lower compared to PP due to TG performance being limited by memory bandwidth. Nevertheless, for some quants/architectures/threads the speedup is quite remarkable (e.g., almost a factor of 2 for `Q5_1` on `AVX2` with 2 threads).