cuda : add batched cuBLAS GEMM for faster attention (#3749)

* cmake : add helper for faster CUDA builds * batched : add NGL arg * ggml : skip nops in compute_forward * cuda : minor indentation * cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops) * Apply suggestions from code review These changes plus: ```c++ #define cublasGemmBatchedEx hipblasGemmBatchedEx ``` are needed to compile with ROCM. I haven't done performance testing, but it seems to work. I couldn't figure out how to propose a change for lines outside what the pull changed, also this is the first time trying to create a multi-part review so please forgive me if I mess something up. * cuda : add ROCm / hipBLAS cublasGemmBatchedEx define * cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases * cuda : reduce mallocs in cublasGemmBatchedEx branch * cuda : add TODO for calling cublas from kernel + using mem pool --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
author: Georgi Gerganov <ggerganov@gmail.com> 2023-10-24 16:48:37 +0300
committer: GitHub <noreply@github.com> 2023-10-24 16:48:37 +0300
commit: 2b4ea35e56792064598e922e46d081e02bc96b94 (patch)
tree: dea0a7b3e47c7d876cbce5d30b31c4c78d7bb030 /ggml.c
parent: daab3d7f45832e10773c99f3484b0d5b14d86c0c (diff)
1 files changed, 4 insertions, 0 deletions
diff --git a/ggml.c b/ggml.c
index 49f3b7ab..17f0ce48 100644
--- a/ggml.c
+++ b/ggml.c
@@ -16602,6 +16602,10 @@ static void ggml_compute_forward_cross_entropy_loss_back(
 static void ggml_compute_forward(struct ggml_compute_params * params, struct ggml_tensor * tensor) {
     GGML_ASSERT(params);
 
+    if (tensor->op == GGML_OP_NONE) {
+        return;
+    }
+
 #ifdef GGML_USE_CUBLAS
     bool skip_cpu = ggml_cuda_compute_forward(params, tensor);
     if (skip_cpu) {
author	Georgi Gerganov <ggerganov@gmail.com>	2023-10-24 16:48:37 +0300
committer	GitHub <noreply@github.com>	2023-10-24 16:48:37 +0300
commit	2b4ea35e56792064598e922e46d081e02bc96b94 (patch)
tree	dea0a7b3e47c7d876cbce5d30b31c4c78d7bb030 /ggml.c
parent	daab3d7f45832e10773c99f3484b0d5b14d86c0c (diff)