diff options
author | Shouzheng Liu <lshzh.hi@gmail.com> | 2023-08-24 12:27:25 -0400 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-08-24 19:27:25 +0300 |
commit | 38b16dfca6e5032e6cfb90c1653bf1ba4cf647b4 (patch) | |
tree | 0c85b951d8d62c6d3bc455ed41d0c5435324c032 /llama.cpp | |
parent | 8f8c28e89cb9531211783da697d6e7c445e2af1d (diff) |
metal : bug-fix when enable ggml-alloc (#2757)
* metal: better memory alloc w/ concurrency dispatch
The ggml-alloc should only free tensors at memory barriers.
* ggml-alloc: avoid return silently
In certain cases, the allocate_node() function may silently return
without performing any memory allocation.
Diffstat (limited to 'llama.cpp')
-rw-r--r-- | llama.cpp | 5 |
1 files changed, 0 insertions, 5 deletions
@@ -2707,11 +2707,6 @@ static struct ggml_cgraph * llm_build_falcon( struct ggml_tensor * inpFF = attn_norm; cur = ggml_mul_mat(ctx0, model.layers[il].w3, inpFF); - - // TODO: this is temporary needed to introduce artificial dependency between FF and ATTN - // adding this, because there seems to be a bug in the Metal concurrency optimization - // without this line, the results are non-deterministic and wrong - cur->src[2] = attn_out; offload_func(cur); cur = ggml_gelu(ctx0, cur); |