Adding SWIGLU unary op (#65)

* Adding GGML_UNARY_OP_SWIGLU This commit implements the ggml op and CPU compute forward. I see ~3-4% speedup of PP-512 for Phi-3.5-mini. * GGML_UNARY_OP_SWIGLU: CUDA implementation I observe ~12% speedup for PP-512(Phi-3.5-mini). * GGML_UNARY_OP_SWIGLU: Metal implementation We get ~2% speedup for PP-512(Phi-3.5-mini). * GGML_UNARY_OP_SWIGLU: minor improvement on Metal * GGML_UNARY_OP_SWIGLU: cleanup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <iwankawrakow@gmail.com> 2024-09-28 13:37:25 +0300
committer: GitHub <noreply@github.com> 2024-09-28 13:37:25 +0300
commit: 737514fd814d944f8ce965620293a16e5e8a285d (patch)
tree: 4b4b79eec0d1cbcc413dd3c6991b6d57439edd86 /ggml/src/ggml-cuda.cu
parent: 1f61e91862dd0b077ccb60459f3cc03f364ee279 (diff)
1 files changed, 4 insertions, 0 deletions
diff --git a/ggml/src/ggml-cuda.cu b/ggml/src/ggml-cuda.cu
index ca57efbd..966c91c0 100644
--- a/ggml/src/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda.cu
@@ -2233,6 +2233,9 @@ static bool ggml_cuda_compute_forward(ggml_backend_cuda_context & ctx, struct gg
                 case GGML_UNARY_OP_SILU:
                     ggml_cuda_op_silu(ctx, dst);
                     break;
+                case GGML_UNARY_OP_SWIGLU:
+                    ggml_cuda_op_swiglu(ctx, dst);
+                    break;
                 case GGML_UNARY_OP_GELU_QUICK:
                     ggml_cuda_op_gelu_quick(ctx, dst);
                     break;
@@ -2773,6 +2776,7 @@ GGML_CALL static bool ggml_backend_cuda_supports_op(ggml_backend_t backend, cons
             switch (ggml_get_unary_op(op)) {
                 case GGML_UNARY_OP_GELU:
                 case GGML_UNARY_OP_SILU:
+                case GGML_UNARY_OP_SWIGLU:
                 case GGML_UNARY_OP_RELU:
                 case GGML_UNARY_OP_SIGMOID:
                 case GGML_UNARY_OP_HARDSIGMOID:
author	Kawrakow <iwankawrakow@gmail.com>	2024-09-28 13:37:25 +0300
committer	GitHub <noreply@github.com>	2024-09-28 13:37:25 +0300
commit	737514fd814d944f8ce965620293a16e5e8a285d (patch)
tree	4b4b79eec0d1cbcc413dd3c6991b6d57439edd86 /ggml/src/ggml-cuda.cu
parent	1f61e91862dd0b077ccb60459f3cc03f364ee279 (diff)