diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-09-04 07:20:55 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-09-04 07:20:55 +0300 |
commit | 8c94dcd43350b6bde8f5618f7e0e9f0b400a2ac6 (patch) | |
tree | 8a109e151d38447d1659fd4494cb160a3d585ca3 /examples | |
parent | 9b53c2533fb8c236f319b874c5ff592de8fcd3b4 (diff) |
Zen4 Flash Attnetion 2 (#36)
* Zen4 Flash Attnetion: WIP generalize to other types
Now loading of data from K and V is done via a template parameter,
so this should make it easy to generalize to typ[es other than
F16 for the K and V cache.
* Zen4 Flash Attnetion: it works for q4_0 and q8_0
* Zen4 Flash Attnetion: small q8_0 performance improvement
* Zen4 Flash Attnetion: add q4_1
* Delete unused stuff
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions