summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2024-08-05 07:35:30 +0200
committerGitHub <noreply@github.com>2024-08-05 07:35:30 +0200
commitc11c7c8cae5ab1abf41c16b7bb27439bb0983c54 (patch)
tree7e8330db63aaddac8d6f2d95641cded46389efdc
parent6901b3bf14ee56b04a6fd50313fe775f871b2722 (diff)
Update README.md
There have been a few minor improvements here and there, so updated the AVX2 Bitnet performance values to current main branch.
-rw-r--r--README.md16
1 files changed, 8 insertions, 8 deletions
diff --git a/README.md b/README.md
index 77072b42..fb78c069 100644
--- a/README.md
+++ b/README.md
@@ -222,27 +222,27 @@ There is the unmerged [PR 8151](https://github.com/ggerganov/llama.cpp/pull/8151
| model | size | backend | threads | test | t/s (llama.cpp) | t/s (this repo)| Speedup |
| ----------- | ---------: | ---------- | ------: | -----: | ---------------: | -------------: | ------: |
-| 3B - IQ1_BN | 729.64 MiB | AVX2 | 16 | pp512 | 120.61 ± 0.48 | 407.06 ± 0.80 | 3.380 |
+| 3B - IQ1_BN | 729.64 MiB | AVX2 | 16 | pp512 | 120.61 ± 0.48 | 423.19 ± 1.28 | 3.509 |
| | | NEON | 8 | pp512 | 46.64 ± 0.02 | 205.90 ± 0.88 | 4.415 |
| | | CUDA | 8 | pp512 | - | 10660 ± 170 | - |
| | | Metal | 8 | pp512 | - | 698.25 ± 1.91 | - |
| | | AVX2 | 2 | tg128 | 15.79 ± 0.01 | 22.13 ± 0.02 | 1.402 |
| | | AVX2 | 4 | tg128 | 28.64 ± 1.72 | 40.14 ± 0.04 | 1.402 |
-| | | AVX2 | 8 | tg128 | 48.91 ± 0.08 | 57.76 ± 2.86 | 1.181 |
-| | | AVX2 | 16 | tg128 | 57.73 ± 0.05 | 60.14 ± 0.04 | 1.042 |
+| | | AVX2 | 8 | tg128 | 48.91 ± 0.08 | 61.79 ± 0.09 | 1.263 |
+| | | AVX2 | 16 | tg128 | 57.73 ± 0.05 | 60.79 ± 0.05 | 1.053 |
| | | NEON | 2 | tg128 | 11.43 ± 0.04 | 16.87 ± 0.02 | 1.476 |
| | | NEON | 4 | tg128 | 21.11 ± 0.05 | 30.66 ± 0.11 | 1.452 |
| | | NEON | 8 | tg128 | 37.36 ± 0.07 | 55.21 ± 0.16 | 1.478 |
| | | CUDA | 8 | tg128 | - | 301.44 ± 0.12 | - |
| | | Metal | 8 | tg128 | - | 76.70 ± 0.07 | - |
-| 3B - IQ2_BN | 873.65 MiB | AVX2 | 16 | pp512 | 151.39 ± 0.35 | 512.79 ± 2.58 | 3.387 |
+| 3B - IQ2_BN | 873.65 MiB | AVX2 | 16 | pp512 | 151.39 ± 0.35 | 540.82 ± 2.48 | 3.572 |
| | | NEON | 8 | pp512 | 46.54 ± 0.03 | 242.05 ± 0.34 | 5.201 |
| | | CUDA | 8 | pp512 | - | 10800 ± 160 | - |
| | | Metal | 8 | pp512 | - | 723.19 ± 0.53 | - |
-| | | AVX2 | 2 | tg128 | 18.93 ± 0.02 | 37.42 ± 0.07 | 1.978 |
-| | | AVX2 | 4 | tg128 | 34.54 ± 0.06 | 53.25 ± 0.02 | 1.542 |
-| | | AVX2 | 8 | tg128 | 52.97 ± 0.07 | 52.06 ± 0.08 | 0.983 |
-| | | AVX2 | 16 | tg128 | 51.84 ± 0.25 | 52.98 ± 0.03 | 1.022 |
+| | | AVX2 | 2 | tg128 | 18.93 ± 0.02 | 38.34 ± 0.08 | 2.026 |
+| | | AVX2 | 4 | tg128 | 34.54 ± 0.06 | 56.29 ± 0.07 | 1.630 |
+| | | AVX2 | 8 | tg128 | 52.97 ± 0.07 | 53.44 ± 0.08 | 1.009 |
+| | | AVX2 | 16 | tg128 | 51.84 ± 0.25 | 53.46 ± 0.07 | 1.031 |
| | | NEON | 2 | tg128 | 11.40 ± 0.02 | 32.01 ± 0.27 | 2.808 |
| | | NEON | 4 | tg128 | 20.99 ± 0.00 | 56.45 ± 0.11 | 2.689 |
| | | NEON | 8 | tg128 | 37.28 ± 0.08 | 89.77 ± 0.70 | 2.408 |