summaryrefslogtreecommitdiff
path: root/ggml-cuda/fattn-common.cuh
AgeCommit message (Expand)Author
2024-06-10CUDA: use tensor cores for MMQ (#7676)Johannes Gäßler
2024-06-01CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)Johannes Gäßler
2024-06-01CUDA: quantized KV support for FA vec (#7527)Johannes Gäßler
2024-05-18CUDA: deduplicate FlashAttention code (#7352)Johannes Gäßler
2024-05-12CUDA: add FP32 FlashAttention vector kernel (#7188)Johannes Gäßler