summaryrefslogtreecommitdiff
path: root/examples/batched.swift
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-03-03 15:17:51 +0200
committerGitHub <noreply@github.com>2025-03-03 15:17:51 +0200
commita87e54db6ec2409284a55f029d4abe9e50990064 (patch)
tree920bb8ce4fbd35e54bda3b61a86d0f87c2ac0ede /examples/batched.swift
parenta89adaa78f505675be7be6180f419b4b0158c15a (diff)
Flash MLA (CPU only) (#240)
* FlashMLA - it finally works (on the CPU) * FlashMLA: allow for f16 and bf16 cache in addition to q8_0 * It works with ggml FA, not with iqk FA * WIP * FlashMLA: it now works with iqk I had forgotten to divide the Q stride by sizeof(float) and that's why, very cobfusingly, it was working for TG but not for PP. * WIP * FlashMLA: that should be it for now --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/batched.swift')
0 files changed, 0 insertions, 0 deletions