summaryrefslogtreecommitdiff
path: root/examples/quantize
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-03-07 09:46:58 +0200
committerGitHub <noreply@github.com>2025-03-07 09:46:58 +0200
commit3d85a1d66302989401f92a5ae347577b03cbdaa7 (patch)
tree7d9ea15568de65954ebddbf71792ad781841fd7f /examples/quantize
parentc67a37b251fc22b0f8b8313ea5c76a73ff6ed49f (diff)
Better FlashMLA (#243)
* This is a better FA for TG It should benefit MLA and GQA. Tested to work with DeepSeek-Lite MLA, not yet for GQA. For tg64@pp8192 it is ~13% faster than MLA without FA, and 57% faster that the main branch FA. * WIP * Cleanup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/quantize')
0 files changed, 0 insertions, 0 deletions