| Age | Commit message (Expand) | Author |
|---|---|---|
| 2024-06-10 | CUDA: use tensor cores for MMQ (#7676) | Johannes Gäßler |
| 2024-06-05 | CUDA: refactor mmq, dmmv, mmvq (#7716) | Johannes Gäßler |
| 2024-05-28 | update HIP_UMA #7399 (#7414) | Djip007 |
| 2024-05-18 | CUDA: deduplicate FlashAttention code (#7352) | Johannes Gäßler |
| 2024-05-18 | cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263) | Engininja2 |
| 2024-05-12 | CUDA: add FP32 FlashAttention vector kernel (#7188) | Johannes Gäßler |
| 2024-05-09 | CUDA: generalize FP16 fattn vec kernel (#7061) | Johannes Gäßler |
| 2024-05-08 | Introduction of CUDA Graphs to LLama.cpp (#6766) | agray3 |
| 2024-05-01 | CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) | Johannes Gäßler |
| 2024-04-30 | ggml : add Flash Attention (#5021) | Georgi Gerganov |
| 2024-04-09 | llama : add Command R Plus support (#6491) | Carolinabanana |
| 2024-03-29 | sync : ggml (#6351) | Georgi Gerganov |
| 2024-03-25 | cuda : refactor into multiple files (#6269) | slaren |
