summaryrefslogtreecommitdiff
path: root/ggml-cuda.cu
AgeCommit message (Expand)Author
2023-12-07sync : ggml (new ops, tests, backend, etc.) (#4359)Georgi Gerganov
2023-12-07llama : per-layer KV cache + quantum K cache (#4309)Georgi Gerganov
2023-12-01ggml : add ggml_soft_max_ext (#4256)Georgi Gerganov
2023-11-24ggml-cuda : support stablelm rope (#4156)slaren
2023-11-23Fix incorrect format strings and uninitialized variables. (#4133)Haohui Mai
2023-11-18Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)Kerfuffle
2023-11-17cuda : get_row_rounding F32 (#4095)Andrew Godfrey
2023-11-17llama : fix data units (#4101)Georgi Gerganov
2023-11-15ggml-cuda : increase max graph size (#4084)slaren
2023-11-13ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)Georgi Gerganov
2023-11-13sync : ggml (backend v2) (#3912)Georgi Gerganov
2023-11-13Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)Kerfuffle
2023-11-07cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)Meng Zhang
2023-11-05ggml-cuda : fix f16 mul mat (#3961)slaren
2023-11-05cuda : fix disabling device with --tensor-split 1,0 (#3951)Jared Van Bortel
2023-11-05cuda : revert CUDA pool stuff (#3944)slaren
2023-11-03ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)slaren
2023-11-02cuda : add ROCM aliases for CUDA pool stuff (#3918)Kerfuffle
2023-11-02cuda : fix const ptrs warning causing ROCm build issues (#3913)Georgi Gerganov
2023-11-02cuda : use CUDA memory pool with async memory allocation/deallocation when av...Oleksii Maryshchenko
2023-11-02cuda : check if this fixes Pascal card regression (#3882)Georgi Gerganov
2023-11-02cuda : fix RoPE after #2268 (#3897)cebtenzzre
2023-11-01ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)slaren
2023-11-01llama : implement YaRN RoPE scaling (#2268)cebtenzzre
2023-11-01finetune : add -ngl parameter (#3762)Andrew Godfrey
2023-10-27cuda : improve text-generation and batched decoding performance (#3776)Georgi Gerganov
2023-10-25batched-bench : print params at startGeorgi Gerganov
2023-10-24sync : ggml (conv ops + cuda MSVC fixes) (#3765)Georgi Gerganov
2023-10-24cuda : add batched cuBLAS GEMM for faster attention (#3749)Georgi Gerganov
2023-10-10llm : add MPT support (#3417)Jan Ploski
2023-10-08sync : ggml (ggml-backend) (#3548)Georgi Gerganov
2023-09-30ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)slaren
2023-09-28llama.cpp : split llama_context_params into model and context params (#3301)slaren
2023-09-28llama : custom attention mask + parallel decoding + no context swaps (#3228)Georgi Gerganov
2023-09-28ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370)slaren
2023-09-17CUDA: fix peer access logic (#3231)Johannes Gäßler
2023-09-17CUDA: enable peer access between devices (#2470)Johannes Gäßler
2023-09-17CUDA: fix scratch malloced on non-main device (#3220)Johannes Gäßler
2023-09-16Enable build with CUDA 11.0 (make) (#3132)Vlad
2023-09-13CUDA: mul_mat_q RDNA2 tunings (#2910)Johannes Gäßler
2023-09-13CUDA: fix LoRAs (#3130)Johannes Gäßler
2023-09-11CUDA: fix mul_mat_q not used for output tensor (#3127)Johannes Gäßler
2023-09-11CUDA: lower GPU latency + fix Windows performance (#3110)Johannes Gäßler
2023-09-11CUDA: add device number to error messages (#3112)Johannes Gäßler
2023-09-08sync : ggml (CUDA GLM RoPE + POSIX) (#3082)Georgi Gerganov
2023-09-042x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)Jiahao Li
2023-09-01cuda : vsubss4 for older versions of ROCm/clang (#2942)Engininja2
2023-08-28CUDA: fix RoPE asserts, block sizes (#2833)Johannes Gäßler
2023-08-27falcon : fix CUDA inference by making K and Q contiguous (#2830)Georgi Gerganov
2023-08-27k_quants tuning for Falcon-7b (#2816)Kawrakow