index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml-cuda.cu
Age
Commit message (
Expand
)
Author
2024-06-22
Bitnet(2.25 bpw): CUDA
Iwan Kawrakow
2024-06-22
bitnet: CUDA, scalar, AVX2
Iwan Kawrakow
2024-06-20
CUDA: stream-k decomposition for MMQ (#8018)
Johannes Gäßler
2024-06-17
Add support for sqrt on CUDA (#7953)
Calvin Laurenson
2024-06-14
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
Johannes Gäßler
2024-06-13
move BLAS to a separate backend (#6210)
slaren
2024-06-12
tests : add non-cont unary tests (#7857)
Georgi Gerganov
2024-06-09
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
Johannes Gäßler
2024-06-05
CUDA: refactor mmq, dmmv, mmvq (#7716)
Johannes Gäßler
2024-06-04
Allow number of nodes in CUDA graph to change (#7738)
agray3
2024-06-01
CUDA: quantized KV support for FA vec (#7527)
Johannes Gäßler
2024-05-29
ggml : fix YARN + add tests + add asserts (#7617)
Georgi Gerganov
2024-05-28
update HIP_UMA #7399 (#7414)
Djip007
2024-05-27
Allow multiple copy function pointers for CUDA graph kernel param updates (#7...
agray3
2024-05-19
cuda : clear error after buffer allocation failure (#7376)
slaren
2024-05-19
Capture CUDA logging output (#7298)
fraxy-v
2024-05-15
Avoid unnecessarily disabling CUDA graphs (#7302)
agray3
2024-05-12
CUDA: add FP32 FlashAttention vector kernel (#7188)
Johannes Gäßler
2024-05-11
feat: implemented sigmoid function (ggml/806)
Justina Cho
2024-05-11
ggml : full ALiBi support (#7192)
Georgi Gerganov
2024-05-08
Introduction of CUDA Graphs to LLama.cpp (#6766)
agray3
2024-05-06
Add an option to build without CUDA VMM (#7067)
William Tambellini
2024-04-30
ggml : add Flash Attention (#5021)
Georgi Gerganov
2024-04-18
ggml : group all experts in a single ggml_mul_mat_id (#6505)
slaren
2024-04-14
CUDA: fix matrix multiplication logic for tests (#6667)
Johannes Gäßler
2024-04-09
llama : add Command R Plus support (#6491)
Carolinabanana
2024-04-07
ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020)
Slava Primenko
2024-04-03
ggml : mul_mat_id use the same tensor for all the experts (#6387)
slaren
2024-03-26
llama : greatly reduce output buffer memory usage (#6122)
compilade
2024-03-26
IQ1_M: 1.75 bpw quantization (#6302)
Kawrakow
2024-03-25
cuda : refactor into multiple files (#6269)
slaren
2024-03-22
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)
slaren
2024-03-21
cuda : disable host register by default (#6206)
slaren
2024-03-21
cuda : fix LLAMA_CUDA_F16 build (#6197)
slaren
2024-03-21
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
Kawrakow
2024-03-21
cuda : fix conflict with std::swap (#6186)
slaren
2024-03-20
cuda : print the returned error when CUDA initialization fails (#6185)
slaren
2024-03-20
cuda : refactor to remove global resources (#6170)
slaren
2024-03-18
backend : offload large batches to GPU (#6083)
slaren
2024-03-15
cuda : disable unused cudaLaunchHostFunc code (#6078)
slaren
2024-03-13
llama : add pipeline parallelism support (#6017)
slaren
2024-03-12
ggml : reuse quantum structs across backends (#5943)
Georgi Gerganov
2024-03-11
1.5 bit: we can do even better (#5999)
Kawrakow
2024-03-11
Better 1.5 bit quantization (#5971)
Kawrakow
2024-03-09
ggml : add ggml-common.h to deduplicate shared code (#5940)
Georgi Gerganov
2024-03-04
ggml : introduce ggml_status (ggml/750)
Michael Podvitskiy
2024-03-04
add some new ops, fix some operators and add batch operations to certain oper...
leejet
2024-03-03
cuda : fix data race in soft max (#5853)
slaren
2024-03-02
ggml : IQ3_S improvements (#5829)
Kawrakow
2024-02-28
Introduce backend GUIDs (ggml/743)
UEXTM.com
[next]