summaryrefslogtreecommitdiff
path: root/ggml-cuda.cu
AgeCommit message (Expand)Author
2024-01-03cuda : simplify expressionGeorgi Gerganov
2024-01-03cuda : mark I16 and I32 ops as unsupportedGeorgi Gerganov
2023-12-30CUDA: fixed tensor cores not being used on RDNA3 (#4697)Johannes Gäßler
2023-12-29CUDA: fix tensor core logic for Pascal and HIP (#4682)Johannes Gäßler
2023-12-29cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687)hydai
2023-12-29ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669)bssrdf
2023-12-26cuda : fix vmm pool with multi GPU (#4620)slaren
2023-12-26Fix new CUDA10 compilation errors (#4635)FantasyGmm
2023-12-24cuda : improve cuda pool efficiency using virtual memory (#4606)slaren
2023-12-23fallback to CPU buffer if host buffer alloc fails (#4610)slaren
2023-12-23CUDA: fixed row rounding for 0 tensor splits (#4594)Johannes Gäßler
2023-12-22sync : ggml (fix im2col) (#4591)Georgi Gerganov
2023-12-22cuda : fix jetson compile error (#4560)FantasyGmm
2023-12-22Fix CudaMemcpy direction (#4599)Henrik Forstén
2023-12-22llama : fix platforms without mmap (#4578)slaren
2023-12-21ggml : change ggml_scale to take a float instead of tensor (#4573)Georgi Gerganov
2023-12-21llama : initial ggml-backend integration (#4520)slaren
2023-12-21cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)Erik Garrison
2023-12-21ggml-cuda: Fix HIP build by adding define for __trap (#4569)arlo-phoenix
2023-12-21CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)Johannes Gäßler
2023-12-21cuda : better error message for ggml_get_rows (#4561)bobqianic
2023-12-21cuda : replace asserts in wrong architecture checks with __trap (#4556)slaren
2023-12-21Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554)LoganDark
2023-12-20CUDA: Faster Mixtral prompt processing (#4538)Johannes Gäßler
2023-12-18ggml-cuda: Fix HIP build (#4528)arlo-phoenix
2023-12-18llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)Ebey Abraham
2023-12-14ggml : use ggml_row_size where possible (#4472)slaren
2023-12-13sync : ggml (SD ops, tests, kernels) (#4444)Georgi Gerganov
2023-12-13llama : add Mixtral support (#4406)slaren
2023-12-07sync : ggml (new ops, tests, backend, etc.) (#4359)Georgi Gerganov
2023-12-07llama : per-layer KV cache + quantum K cache (#4309)Georgi Gerganov
2023-12-01ggml : add ggml_soft_max_ext (#4256)Georgi Gerganov
2023-11-24ggml-cuda : support stablelm rope (#4156)slaren
2023-11-23Fix incorrect format strings and uninitialized variables. (#4133)Haohui Mai
2023-11-18Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)Kerfuffle
2023-11-17cuda : get_row_rounding F32 (#4095)Andrew Godfrey
2023-11-17llama : fix data units (#4101)Georgi Gerganov
2023-11-15ggml-cuda : increase max graph size (#4084)slaren
2023-11-13ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)Georgi Gerganov
2023-11-13sync : ggml (backend v2) (#3912)Georgi Gerganov
2023-11-13Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)Kerfuffle
2023-11-07cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)Meng Zhang
2023-11-05ggml-cuda : fix f16 mul mat (#3961)slaren
2023-11-05cuda : fix disabling device with --tensor-split 1,0 (#3951)Jared Van Bortel
2023-11-05cuda : revert CUDA pool stuff (#3944)slaren
2023-11-03ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)slaren
2023-11-02cuda : add ROCM aliases for CUDA pool stuff (#3918)Kerfuffle
2023-11-02cuda : fix const ptrs warning causing ROCm build issues (#3913)Georgi Gerganov
2023-11-02cuda : use CUDA memory pool with async memory allocation/deallocation when av...Oleksii Maryshchenko
2023-11-02cuda : check if this fixes Pascal card regression (#3882)Georgi Gerganov