index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml-cuda
/
common.cuh
Age
Commit message (
Expand
)
Author
2024-06-22
Bitnet(1.75 bpw): higher precision fp8 scale
Iwan Kawrakow
2024-06-22
Bitnet(2.25 bpw): CUDA
Iwan Kawrakow
2024-06-22
bitnet: CUDA, scalar, AVX2
Iwan Kawrakow
2024-06-20
CUDA: stream-k decomposition for MMQ (#8018)
Johannes Gäßler
2024-06-14
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
Johannes Gäßler
2024-06-10
CUDA: use tensor cores for MMQ (#7676)
Johannes Gäßler
2024-06-05
CUDA: refactor mmq, dmmv, mmvq (#7716)
Johannes Gäßler
2024-05-28
update HIP_UMA #7399 (#7414)
Djip007
2024-05-18
CUDA: deduplicate FlashAttention code (#7352)
Johannes Gäßler
2024-05-18
cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)
Engininja2
2024-05-12
CUDA: add FP32 FlashAttention vector kernel (#7188)
Johannes Gäßler
2024-05-09
CUDA: generalize FP16 fattn vec kernel (#7061)
Johannes Gäßler
2024-05-08
Introduction of CUDA Graphs to LLama.cpp (#6766)
agray3
2024-05-01
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
Johannes Gäßler
2024-04-30
ggml : add Flash Attention (#5021)
Georgi Gerganov
2024-04-09
llama : add Command R Plus support (#6491)
Carolinabanana
2024-03-29
sync : ggml (#6351)
Georgi Gerganov
2024-03-25
cuda : refactor into multiple files (#6269)
slaren