metal : bug-fix when enable ggml-alloc (#2757)

* metal: better memory alloc w/ concurrency dispatch The ggml-alloc should only free tensors at memory barriers. * ggml-alloc: avoid return silently In certain cases, the allocate_node() function may silently return without performing any memory allocation.
author: Shouzheng Liu <lshzh.hi@gmail.com> 2023-08-24 12:27:25 -0400
committer: GitHub <noreply@github.com> 2023-08-24 19:27:25 +0300
commit: 38b16dfca6e5032e6cfb90c1653bf1ba4cf647b4 (patch)
tree: 0c85b951d8d62c6d3bc455ed41d0c5435324c032 /llama.cpp
parent: 8f8c28e89cb9531211783da697d6e7c445e2af1d (diff)
1 files changed, 0 insertions, 5 deletions
diff --git a/llama.cpp b/llama.cpp
index 7ee6bcda..b5266c1e 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -2707,11 +2707,6 @@ static struct ggml_cgraph * llm_build_falcon(
             struct ggml_tensor * inpFF = attn_norm;
 
             cur = ggml_mul_mat(ctx0, model.layers[il].w3, inpFF);
-
-            // TODO: this is temporary needed to introduce artificial dependency between FF and ATTN
-            //       adding this, because there seems to be a bug in the Metal concurrency optimization
-            //       without this line, the results are non-deterministic and wrong
-            cur->src[2] = attn_out;
             offload_func(cur);
 
             cur = ggml_gelu(ctx0, cur);
author	Shouzheng Liu <lshzh.hi@gmail.com>	2023-08-24 12:27:25 -0400
committer	GitHub <noreply@github.com>	2023-08-24 19:27:25 +0300
commit	38b16dfca6e5032e6cfb90c1653bf1ba4cf647b4 (patch)
tree	0c85b951d8d62c6d3bc455ed41d0c5435324c032 /llama.cpp
parent	8f8c28e89cb9531211783da697d6e7c445e2af1d (diff)