diff options
author | Georgi Gerganov <ggerganov@gmail.com> | 2023-10-24 16:48:37 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-24 16:48:37 +0300 |
commit | 2b4ea35e56792064598e922e46d081e02bc96b94 (patch) | |
tree | dea0a7b3e47c7d876cbce5d30b31c4c78d7bb030 /examples/server/deps.sh | |
parent | daab3d7f45832e10773c99f3484b0d5b14d86c0c (diff) |
cuda : add batched cuBLAS GEMM for faster attention (#3749)
* cmake : add helper for faster CUDA builds
* batched : add NGL arg
* ggml : skip nops in compute_forward
* cuda : minor indentation
* cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)
* Apply suggestions from code review
These changes plus:
```c++
#define cublasGemmBatchedEx hipblasGemmBatchedEx
```
are needed to compile with ROCM. I haven't done performance testing, but it seems to work.
I couldn't figure out how to propose a change for lines outside what the pull changed, also this is the first time trying to create a multi-part review so please forgive me if I mess something up.
* cuda : add ROCm / hipBLAS cublasGemmBatchedEx define
* cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases
* cuda : reduce mallocs in cublasGemmBatchedEx branch
* cuda : add TODO for calling cublas from kernel + using mem pool
---------
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Diffstat (limited to 'examples/server/deps.sh')
0 files changed, 0 insertions, 0 deletions