diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-01-22 12:13:55 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-01-22 12:13:55 +0200 |
commit | dbf5d31d01e14a0ba692efafca5e4d66ada60b8a (patch) | |
tree | 64c7022a940a48c5f3153429758a9e1083f1edda /src/llama.cpp | |
parent | 6d23495b9bb8945c6ec1c38ced4b44180fbac3c6 (diff) |
Better BF16 support on AVX2 (#175)
* Adding BF16 support for AVX2
PP performance is the same as fp16 (~153 t/s on Ryzen-5975WX),
but TG is quite a bit lower (3.65 t/s vs 4.72 t/s at 8 threads).
Why?
* Slightly faster fp16/bf16 gemv on AVX2
It still saturates at the same lower peformance for bf16
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'src/llama.cpp')
0 files changed, 0 insertions, 0 deletions