Skip barriers of noops (#19)

GGML_OP_RESHAPE, GGML_OP_VIEW, GGML_OP_PERMUTE, GGML_OP_TRANSPOSE, along with GGML_OP_NONE, are all noops. I.e., nothinh happens. But ggml still has a barrier after them, which wastes time. The waste is not too bad for large models where computations are long compared to the time taken for thread synchronization. But for small models skipping those unnecessary waits makes a significant difference. E.g., for the 99M TriLMamodel, TG-500 goes up to 1426 t/s from 1240 t/s. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <48489457+ikawrakow@users.noreply.github.com> 2024-08-14 10:40:09 +0200
committer: GitHub <noreply@github.com> 2024-08-14 10:40:09 +0200
commit: 6c5384f20e8657a23aa9d4e0e9856d3d7563a12a (patch)
tree: ec7633442345b8e69eee23180d8cf56fe0f59811
parent: bb5ff6fadec40c2e3aa3033dc68bec9367a0c9cc (diff)
2 files changed, 9 insertions, 0 deletions
diff --git a/ggml/include/ggml.h b/ggml/include/ggml.h
index b9b0284b..026993db 100644
--- a/ggml/include/ggml.h
+++ b/ggml/include/ggml.h
@@ -749,6 +749,8 @@ extern "C" {
     GGML_API GGML_CALL const char * ggml_op_name  (enum ggml_op   op);
     GGML_API           const char * ggml_op_symbol(enum ggml_op   op);
 
+    GGML_API GGML_CALL bool ggml_is_noop(const struct ggml_tensor * tensor);
+
     GGML_API           const char * ggml_unary_op_name(enum ggml_unary_op op);
     GGML_API GGML_CALL const char * ggml_op_desc(const struct ggml_tensor * t); // unary or op name
 
diff --git a/ggml/src/ggml.c b/ggml/src/ggml.c
index 73054bfe..e7f1ae61 100644
--- a/ggml/src/ggml.c
+++ b/ggml/src/ggml.c
@@ -3581,6 +3581,11 @@ GGML_CALL bool ggml_is_empty(const struct ggml_tensor * tensor) {
     return false;
 }
 
+GGML_CALL bool ggml_is_noop(const struct ggml_tensor * tensor) {
+    return tensor->op == GGML_OP_NONE || tensor->op == GGML_OP_RESHAPE || tensor->op == GGML_OP_VIEW || tensor->op == GGML_OP_PERMUTE || tensor->op == GGML_OP_TRANSPOSE ||
+           ggml_is_empty(tensor) ? true : false;
+}
+
 bool ggml_are_same_shape(const struct ggml_tensor * t0, const struct ggml_tensor * t1) {
     static_assert(GGML_MAX_DIMS == 4, "GGML_MAX_DIMS is not 4 - update this function");
 
@@ -19208,6 +19213,8 @@ static thread_ret_t ggml_graph_compute_thread(void * data) {
     for (int node_n = 0; node_n < cgraph->n_nodes; node_n++) {
         struct ggml_tensor * node = cgraph->nodes[node_n];
 
+        if (ggml_is_noop(node)) continue;
+
         ggml_compute_forward(&params, node);
 
         if (state->ith == 0 && cplan->abort_callback && cplan->abort_callback(cplan->abort_callback_data)) {
author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2024-08-14 10:40:09 +0200
committer	GitHub <noreply@github.com>	2024-08-14 10:40:09 +0200
commit	6c5384f20e8657a23aa9d4e0e9856d3d7563a12a (patch)
tree	ec7633442345b8e69eee23180d8cf56fe0f59811
parent	bb5ff6fadec40c2e3aa3033dc68bec9367a0c9cc (diff)