index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml-cuda.cu
Age
Commit message (
Expand
)
Author
2023-12-07
sync : ggml (new ops, tests, backend, etc.) (#4359)
Georgi Gerganov
2023-12-07
llama : per-layer KV cache + quantum K cache (#4309)
Georgi Gerganov
2023-12-01
ggml : add ggml_soft_max_ext (#4256)
Georgi Gerganov
2023-11-24
ggml-cuda : support stablelm rope (#4156)
slaren
2023-11-23
Fix incorrect format strings and uninitialized variables. (#4133)
Haohui Mai
2023-11-18
Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)
Kerfuffle
2023-11-17
cuda : get_row_rounding F32 (#4095)
Andrew Godfrey
2023-11-17
llama : fix data units (#4101)
Georgi Gerganov
2023-11-15
ggml-cuda : increase max graph size (#4084)
slaren
2023-11-13
ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)
Georgi Gerganov
2023-11-13
sync : ggml (backend v2) (#3912)
Georgi Gerganov
2023-11-13
Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)
Kerfuffle
2023-11-07
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
Meng Zhang
2023-11-05
ggml-cuda : fix f16 mul mat (#3961)
slaren
2023-11-05
cuda : fix disabling device with --tensor-split 1,0 (#3951)
Jared Van Bortel
2023-11-05
cuda : revert CUDA pool stuff (#3944)
slaren
2023-11-03
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)
slaren
2023-11-02
cuda : add ROCM aliases for CUDA pool stuff (#3918)
Kerfuffle
2023-11-02
cuda : fix const ptrs warning causing ROCm build issues (#3913)
Georgi Gerganov
2023-11-02
cuda : use CUDA memory pool with async memory allocation/deallocation when av...
Oleksii Maryshchenko
2023-11-02
cuda : check if this fixes Pascal card regression (#3882)
Georgi Gerganov
2023-11-02
cuda : fix RoPE after #2268 (#3897)
cebtenzzre
2023-11-01
ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)
slaren
2023-11-01
llama : implement YaRN RoPE scaling (#2268)
cebtenzzre
2023-11-01
finetune : add -ngl parameter (#3762)
Andrew Godfrey
2023-10-27
cuda : improve text-generation and batched decoding performance (#3776)
Georgi Gerganov
2023-10-25
batched-bench : print params at start
Georgi Gerganov
2023-10-24
sync : ggml (conv ops + cuda MSVC fixes) (#3765)
Georgi Gerganov
2023-10-24
cuda : add batched cuBLAS GEMM for faster attention (#3749)
Georgi Gerganov
2023-10-10
llm : add MPT support (#3417)
Jan Ploski
2023-10-08
sync : ggml (ggml-backend) (#3548)
Georgi Gerganov
2023-09-30
ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)
slaren
2023-09-28
llama.cpp : split llama_context_params into model and context params (#3301)
slaren
2023-09-28
llama : custom attention mask + parallel decoding + no context swaps (#3228)
Georgi Gerganov
2023-09-28
ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370)
slaren
2023-09-17
CUDA: fix peer access logic (#3231)
Johannes Gäßler
2023-09-17
CUDA: enable peer access between devices (#2470)
Johannes Gäßler
2023-09-17
CUDA: fix scratch malloced on non-main device (#3220)
Johannes Gäßler
2023-09-16
Enable build with CUDA 11.0 (make) (#3132)
Vlad
2023-09-13
CUDA: mul_mat_q RDNA2 tunings (#2910)
Johannes Gäßler
2023-09-13
CUDA: fix LoRAs (#3130)
Johannes Gäßler
2023-09-11
CUDA: fix mul_mat_q not used for output tensor (#3127)
Johannes Gäßler
2023-09-11
CUDA: lower GPU latency + fix Windows performance (#3110)
Johannes Gäßler
2023-09-11
CUDA: add device number to error messages (#3112)
Johannes Gäßler
2023-09-08
sync : ggml (CUDA GLM RoPE + POSIX) (#3082)
Georgi Gerganov
2023-09-04
2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)
Jiahao Li
2023-09-01
cuda : vsubss4 for older versions of ROCm/clang (#2942)
Engininja2
2023-08-28
CUDA: fix RoPE asserts, block sizes (#2833)
Johannes Gäßler
2023-08-27
falcon : fix CUDA inference by making K and Q contiguous (#2830)
Georgi Gerganov
2023-08-27
k_quants tuning for Falcon-7b (#2816)
Kawrakow
[next]