| Age | Commit message (Expand) | Author |
|---|---|---|
| 2024-06-10 | CUDA: use tensor cores for MMQ (#7676) | Johannes Gäßler |
| 2024-06-01 | CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) | Johannes Gäßler |
| 2024-06-01 | CUDA: quantized KV support for FA vec (#7527) | Johannes Gäßler |
| 2024-05-18 | CUDA: deduplicate FlashAttention code (#7352) | Johannes Gäßler |
| 2024-05-12 | CUDA: add FP32 FlashAttention vector kernel (#7188) | Johannes Gäßler |
