ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2025-01-22 12:13:55 +0200
committer	GitHub <noreply@github.com>	2025-01-22 12:13:55 +0200
commit	dbf5d31d01e14a0ba692efafca5e4d66ada60b8a (patch)
tree	64c7022a940a48c5f3153429758a9e1083f1edda /src/llama.cpp
parent	6d23495b9bb8945c6ec1c38ced4b44180fbac3c6 (diff)

Better BF16 support on AVX2 (#175)

* Adding BF16 support for AVX2 PP performance is the same as fp16 (~153 t/s on Ryzen-5975WX), but TG is quite a bit lower (3.65 t/s vs 4.72 t/s at 8 threads). Why? * Slightly faster fp16/bf16 gemv on AVX2 It still saturates at the same lower peformance for bf16 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'src/llama.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: