diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-09-05 07:46:47 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-09-05 07:46:47 +0300 |
commit | 7b1b2b2c06c1729139135c9e47611af7161de6f7 (patch) | |
tree | ab79924dbb9f2ff780dd669fa65f826aae74d0b7 /examples/parallel/README.md | |
parent | f17d0d72f565bf24d6eb8aa67d6618cdc143961d (diff) |
Zen4 Flash Attention - bf16 support (#38)
* Zen4 Flash Attnetion: WIP bf16
* Zen4 Flash Attnetion: bf16 seems to be working
* Zen4 Flash Attnetion: improving bf16
* Zen4 Flash Attnetion: improving bf16
It is better (slightly faster) to first convert Q
to bf16 before processing each block of q_step rows.
This requires D*q_step*sizeof(bf16) bytes, so at
most 4 kb for the head sizes we support, so we can
just allocate on the stack instead of reserving and
passing a work buffer in ggml.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/parallel/README.md')
0 files changed, 0 insertions, 0 deletions