ggml : update softmax n_task calculation (#5126)

updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
author: snadampal <87143774+snadampal@users.noreply.github.com> 2024-01-26 11:17:59 -0600
committer: GitHub <noreply@github.com> 2024-01-26 19:17:59 +0200
commit: 7032f4f6349c17a8352f9f93f7d2122f45469e59 (patch)
tree: a46a86b55b9bd975fc60e8784da74b8ad64c18a5
parent: 5f1925a8cef81eb9b372faaae34b0dd76d5361d4 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/ggml.c b/ggml.c
index ca98fde8..ef6fd8ca 100644
--- a/ggml.c
+++ b/ggml.c
@@ -16597,7 +16597,7 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) {
             } break;
         case GGML_OP_SOFT_MAX:
             {
-                n_tasks = MIN(MIN(4, n_threads), ggml_nrows(node->src[0]));
+                n_tasks = MIN(n_threads, ggml_nrows(node->src[0]));
             } break;
         case GGML_OP_CONV_TRANSPOSE_1D:
             {
author	snadampal <87143774+snadampal@users.noreply.github.com>	2024-01-26 11:17:59 -0600
committer	GitHub <noreply@github.com>	2024-01-26 19:17:59 +0200
commit	7032f4f6349c17a8352f9f93f7d2122f45469e59 (patch)
tree	a46a86b55b9bd975fc60e8784da74b8ad64c18a5
parent	5f1925a8cef81eb9b372faaae34b0dd76d5361d4 (diff)