diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-08-05 07:35:30 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-08-05 07:35:30 +0200 |
commit | c11c7c8cae5ab1abf41c16b7bb27439bb0983c54 (patch) | |
tree | 7e8330db63aaddac8d6f2d95641cded46389efdc | |
parent | 6901b3bf14ee56b04a6fd50313fe775f871b2722 (diff) |
Update README.md
There have been a few minor improvements here and there, so updated the AVX2 Bitnet performance values to current main branch.
-rw-r--r-- | README.md | 16 |
1 files changed, 8 insertions, 8 deletions
@@ -222,27 +222,27 @@ There is the unmerged [PR 8151](https://github.com/ggerganov/llama.cpp/pull/8151 | model | size | backend | threads | test | t/s (llama.cpp) | t/s (this repo)| Speedup | | ----------- | ---------: | ---------- | ------: | -----: | ---------------: | -------------: | ------: | -| 3B - IQ1_BN | 729.64 MiB | AVX2 | 16 | pp512 | 120.61 ± 0.48 | 407.06 ± 0.80 | 3.380 | +| 3B - IQ1_BN | 729.64 MiB | AVX2 | 16 | pp512 | 120.61 ± 0.48 | 423.19 ± 1.28 | 3.509 | | | | NEON | 8 | pp512 | 46.64 ± 0.02 | 205.90 ± 0.88 | 4.415 | | | | CUDA | 8 | pp512 | - | 10660 ± 170 | - | | | | Metal | 8 | pp512 | - | 698.25 ± 1.91 | - | | | | AVX2 | 2 | tg128 | 15.79 ± 0.01 | 22.13 ± 0.02 | 1.402 | | | | AVX2 | 4 | tg128 | 28.64 ± 1.72 | 40.14 ± 0.04 | 1.402 | -| | | AVX2 | 8 | tg128 | 48.91 ± 0.08 | 57.76 ± 2.86 | 1.181 | -| | | AVX2 | 16 | tg128 | 57.73 ± 0.05 | 60.14 ± 0.04 | 1.042 | +| | | AVX2 | 8 | tg128 | 48.91 ± 0.08 | 61.79 ± 0.09 | 1.263 | +| | | AVX2 | 16 | tg128 | 57.73 ± 0.05 | 60.79 ± 0.05 | 1.053 | | | | NEON | 2 | tg128 | 11.43 ± 0.04 | 16.87 ± 0.02 | 1.476 | | | | NEON | 4 | tg128 | 21.11 ± 0.05 | 30.66 ± 0.11 | 1.452 | | | | NEON | 8 | tg128 | 37.36 ± 0.07 | 55.21 ± 0.16 | 1.478 | | | | CUDA | 8 | tg128 | - | 301.44 ± 0.12 | - | | | | Metal | 8 | tg128 | - | 76.70 ± 0.07 | - | -| 3B - IQ2_BN | 873.65 MiB | AVX2 | 16 | pp512 | 151.39 ± 0.35 | 512.79 ± 2.58 | 3.387 | +| 3B - IQ2_BN | 873.65 MiB | AVX2 | 16 | pp512 | 151.39 ± 0.35 | 540.82 ± 2.48 | 3.572 | | | | NEON | 8 | pp512 | 46.54 ± 0.03 | 242.05 ± 0.34 | 5.201 | | | | CUDA | 8 | pp512 | - | 10800 ± 160 | - | | | | Metal | 8 | pp512 | - | 723.19 ± 0.53 | - | -| | | AVX2 | 2 | tg128 | 18.93 ± 0.02 | 37.42 ± 0.07 | 1.978 | -| | | AVX2 | 4 | tg128 | 34.54 ± 0.06 | 53.25 ± 0.02 | 1.542 | -| | | AVX2 | 8 | tg128 | 52.97 ± 0.07 | 52.06 ± 0.08 | 0.983 | -| | | AVX2 | 16 | tg128 | 51.84 ± 0.25 | 52.98 ± 0.03 | 1.022 | +| | | AVX2 | 2 | tg128 | 18.93 ± 0.02 | 38.34 ± 0.08 | 2.026 | +| | | AVX2 | 4 | tg128 | 34.54 ± 0.06 | 56.29 ± 0.07 | 1.630 | +| | | AVX2 | 8 | tg128 | 52.97 ± 0.07 | 53.44 ± 0.08 | 1.009 | +| | | AVX2 | 16 | tg128 | 51.84 ± 0.25 | 53.46 ± 0.07 | 1.031 | | | | NEON | 2 | tg128 | 11.40 ± 0.02 | 32.01 ± 0.27 | 2.808 | | | | NEON | 4 | tg128 | 20.99 ± 0.00 | 56.45 ± 0.11 | 2.689 | | | | NEON | 8 | tg128 | 37.28 ± 0.08 | 89.77 ± 0.70 | 2.408 | |