From c19404bcdaa2a1f8900801d4865673e5f7a03f63 Mon Sep 17 00:00:00 2001 From: Kawrakow Date: Sun, 12 Jan 2025 13:19:14 +0200 Subject: MoE fix for R4 quants (#170) * Fix bug in iqk_mul_mat I recently added the possibility to have a matrix multiplication kernel that processes 16 columns in the right matrix per iteration. This introduced a bug that shows up when batch size is greater than 16, is not a multiple of 16, and the remainder is not a multiple of the maximum columns being processed by the regular kernels (and so, never showed up in my testing using TG-128 and PP-512). This commit fixes the issue. * Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants --------- Co-authored-by: Iwan Kawrakow --- examples/batched-bench/batched-bench.cpp | 2 ++ 1 file changed, 2 insertions(+) (limited to 'examples') diff --git a/examples/batched-bench/batched-bench.cpp b/examples/batched-bench/batched-bench.cpp index 25e7c775..55f825fe 100644 --- a/examples/batched-bench/batched-bench.cpp +++ b/examples/batched-bench/batched-bench.cpp @@ -139,6 +139,8 @@ int main(int argc, char ** argv) { const int n_ctx_req = is_pp_shared ? pp + pl*tg : pl*(pp + tg); if (n_ctx_req > n_kv_max) { + printf("n_ctx_req = %d is greater than n_kv_max = %d for pp = %d, tg = %d, pl = %d\n", + n_ctx_req, n_kv_max, pp, tg, pl); continue; } -- cgit v1.2.3