index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml-cuda
Age
Commit message (
Expand
)
Author
2024-06-17
Add support for sqrt on CUDA (#7953)
Calvin Laurenson
2024-06-16
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
Georgi Gerganov
2024-06-14
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
Johannes Gäßler
2024-06-12
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
Johannes Gäßler
2024-06-12
tests : add non-cont unary tests (#7857)
Georgi Gerganov
2024-06-11
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)
Johannes Gäßler
2024-06-10
CUDA: use tensor cores for MMQ (#7676)
Johannes Gäßler
2024-06-09
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
Johannes Gäßler
2024-06-05
CUDA: refactor mmq, dmmv, mmvq (#7716)
Johannes Gäßler
2024-06-05
ggml : refactor rope norm/neox (#7634)
Georgi Gerganov
2024-06-01
Fix FlashAttention debug test, FP32 assert (#7684)
Johannes Gäßler
2024-06-01
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)
Johannes Gäßler
2024-06-01
CUDA: quantized KV support for FA vec (#7527)
Johannes Gäßler
2024-05-29
ggml : fix YARN + add tests + add asserts (#7617)
Georgi Gerganov
2024-05-29
cuda : non-cont concat support (#7610)
Georgi Gerganov
2024-05-28
ggml : generalize GGML_OP_CONCAT (#7563)
Georgi Gerganov
2024-05-28
update HIP_UMA #7399 (#7414)
Djip007
2024-05-23
ggml : drop support for QK_K=64 (#7473)
Georgi Gerganov
2024-05-23
CUDA: fix FA out-of-bounds reads (#7479)
Johannes Gäßler
2024-05-22
CUDA: fix FA out-of-bounds writes (#7465)
Johannes Gäßler
2024-05-22
cuda : fix compile warning (#7454)
Georgi Gerganov
2024-05-22
CUDA: remove incorrect precision check (#7454)
Johannes Gäßler
2024-05-22
cuda : fix rope + add tests (#7452)
Georgi Gerganov
2024-05-21
llama : add phi3 128K model support (#7225)
liuwei-git
2024-05-21
CUDA: fix unused warning in mmq.cu (#7442)
Johannes Gäßler
2024-05-21
CUDA: deduplicate mmq code (#7397)
Johannes Gäßler
2024-05-18
CUDA: deduplicate FlashAttention code (#7352)
Johannes Gäßler
2024-05-18
cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)
Engininja2
2024-05-17
CUDA: faster large batch FA without tensor cores (#7314)
Johannes Gäßler
2024-05-15
ggml : add `ggml_upscale_ext` (ggml/814)
John Balis
2024-05-12
CUDA: add FP32 FlashAttention vector kernel (#7188)
Johannes Gäßler
2024-05-11
feat: implemented sigmoid function (ggml/806)
Justina Cho
2024-05-11
ggml : full ALiBi support (#7192)
Georgi Gerganov
2024-05-09
CUDA: generalize FP16 fattn vec kernel (#7061)
Johannes Gäßler
2024-05-08
Introduction of CUDA Graphs to LLama.cpp (#6766)
agray3
2024-05-01
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
Johannes Gäßler
2024-04-30
ggml : add Flash Attention (#5021)
Georgi Gerganov
2024-04-29
Fix more int overflow during quant (PPL/CUDA). (#6563)
DAN™
2024-04-18
ggml : group all experts in a single ggml_mul_mat_id (#6505)
slaren
2024-04-09
llama : add Command R Plus support (#6491)
Carolinabanana
2024-04-03
ggml : mul_mat_id use the same tensor for all the experts (#6387)
slaren
2024-03-29
sync : ggml (#6351)
Georgi Gerganov
2024-03-26
IQ1_M: 1.75 bpw quantization (#6302)
Kawrakow
2024-03-25
cuda : fix LLAMA_CUDA_F16 build (#6298)
slaren
2024-03-25
cuda : refactor into multiple files (#6269)
slaren