Adding SWIGLU unary op (#65)

* Adding GGML_UNARY_OP_SWIGLU This commit implements the ggml op and CPU compute forward. I see ~3-4% speedup of PP-512 for Phi-3.5-mini. * GGML_UNARY_OP_SWIGLU: CUDA implementation I observe ~12% speedup for PP-512(Phi-3.5-mini). * GGML_UNARY_OP_SWIGLU: Metal implementation We get ~2% speedup for PP-512(Phi-3.5-mini). * GGML_UNARY_OP_SWIGLU: minor improvement on Metal * GGML_UNARY_OP_SWIGLU: cleanup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <iwankawrakow@gmail.com> 2024-09-28 13:37:25 +0300
committer: GitHub <noreply@github.com> 2024-09-28 13:37:25 +0300
commit: 737514fd814d944f8ce965620293a16e5e8a285d (patch)
tree: 4b4b79eec0d1cbcc413dd3c6991b6d57439edd86 /ggml/include/ggml.h
parent: 1f61e91862dd0b077ccb60459f3cc03f364ee279 (diff)
1 files changed, 5 insertions, 0 deletions
diff --git a/ggml/include/ggml.h b/ggml/include/ggml.h
index 6ac30b0f..36cc531f 100644
--- a/ggml/include/ggml.h
+++ b/ggml/include/ggml.h
@@ -564,6 +564,7 @@ extern "C" {
         GGML_UNARY_OP_SILU,
         GGML_UNARY_OP_HARDSWISH,
         GGML_UNARY_OP_HARDSIGMOID,
+        GGML_UNARY_OP_SWIGLU,
 
         GGML_UNARY_OP_COUNT,
     };
@@ -1127,6 +1128,10 @@ extern "C" {
             struct ggml_context * ctx,
             struct ggml_tensor  * a);
 
+    GGML_API struct ggml_tensor * ggml_swiglu(
+            struct ggml_context * ctx,
+            struct ggml_tensor  * a);
+
     // a - x
     // b - dy
     GGML_API struct ggml_tensor * ggml_silu_back(
author	Kawrakow <iwankawrakow@gmail.com>	2024-09-28 13:37:25 +0300
committer	GitHub <noreply@github.com>	2024-09-28 13:37:25 +0300
commit	737514fd814d944f8ce965620293a16e5e8a285d (patch)
tree	4b4b79eec0d1cbcc413dd3c6991b6d57439edd86 /ggml/include/ggml.h
parent	1f61e91862dd0b077ccb60459f3cc03f364ee279 (diff)