diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-03-03 15:17:51 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-03-03 15:17:51 +0200 |
commit | a87e54db6ec2409284a55f029d4abe9e50990064 (patch) | |
tree | 920bb8ce4fbd35e54bda3b61a86d0f87c2ac0ede /examples/server_embd.py | |
parent | a89adaa78f505675be7be6180f419b4b0158c15a (diff) |
Flash MLA (CPU only) (#240)
* FlashMLA - it finally works (on the CPU)
* FlashMLA: allow for f16 and bf16 cache in addition to q8_0
* It works with ggml FA, not with iqk FA
* WIP
* FlashMLA: it now works with iqk
I had forgotten to divide the Q stride by sizeof(float) and
that's why, very cobfusingly, it was working for TG but not for PP.
* WIP
* FlashMLA: that should be it for now
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/server_embd.py')
0 files changed, 0 insertions, 0 deletions