Fix ggml_compute_forward_dup_q (#269)

I broke it with PR #265. I was testing with a model where the wk_b and wk_v tensors were present, so didn't need to be computed, so didn't notice that the change I made to ggml_compute_forward_dup_q breaks that computation. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <iwankawrakow@gmail.com> 2025-03-19 15:47:24 +0100
committer: GitHub <noreply@github.com> 2025-03-19 15:47:24 +0100
commit: 22c84a126f50146a851641ccaa6e8a24f0985d79 (patch)
tree: 2e09c885b40089880414f723e77c2433a0742ffb
parent: c3b75c531c5d5e295d5c3ba846bb37def72df3a7 (diff)
1 files changed, 5 insertions, 0 deletions
diff --git a/ggml/src/ggml.c b/ggml/src/ggml.c
index 1552d91b..faf1902d 100644
--- a/ggml/src/ggml.c
+++ b/ggml/src/ggml.c
@@ -10576,6 +10576,11 @@ static void ggml_compute_forward_dup_q(
     if (dst->type == GGML_TYPE_Q8_0 && dst->src[0]->type == GGML_TYPE_Q8_0 &&
             ggml_are_same_shape(dst, dst->src[0])) {
 
+        if (dst->src[0]->nb[0] == sizeof(block_q8_0) && dst->nb[0] == sizeof(block_q8_0)) {
+            ggml_compute_forward_dup_bytes(params, dst);
+            return;
+        }
+
         // we assume src is transposed and that's why we are here
 
         GGML_ASSERT(dst->ne[0] % QK8_0 == 0);
author	Kawrakow <iwankawrakow@gmail.com>	2025-03-19 15:47:24 +0100
committer	GitHub <noreply@github.com>	2025-03-19 15:47:24 +0100
commit	22c84a126f50146a851641ccaa6e8a24f0985d79 (patch)
tree	2e09c885b40089880414f723e77c2433a0742ffb
parent	c3b75c531c5d5e295d5c3ba846bb37def72df3a7 (diff)