summaryrefslogtreecommitdiff
path: root/src/llama.cpp
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-01-22 12:13:55 +0200
committerGitHub <noreply@github.com>2025-01-22 12:13:55 +0200
commitdbf5d31d01e14a0ba692efafca5e4d66ada60b8a (patch)
tree64c7022a940a48c5f3153429758a9e1083f1edda /src/llama.cpp
parent6d23495b9bb8945c6ec1c38ced4b44180fbac3c6 (diff)
Better BF16 support on AVX2 (#175)
* Adding BF16 support for AVX2 PP performance is the same as fp16 (~153 t/s on Ryzen-5975WX), but TG is quite a bit lower (3.65 t/s vs 4.72 t/s at 8 threads). Why? * Slightly faster fp16/bf16 gemv on AVX2 It still saturates at the same lower peformance for bf16 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'src/llama.cpp')
0 files changed, 0 insertions, 0 deletions