summaryrefslogtreecommitdiff
path: root/examples
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-01-12 13:19:14 +0200
committerGitHub <noreply@github.com>2025-01-12 13:19:14 +0200
commitc19404bcdaa2a1f8900801d4865673e5f7a03f63 (patch)
treed558fa4a0ace9333730cb7afa98644660091fc13 /examples
parent7553989dd88749de028853f9c0ea39651aad92a3 (diff)
MoE fix for R4 quants (#170)
* Fix bug in iqk_mul_mat I recently added the possibility to have a matrix multiplication kernel that processes 16 columns in the right matrix per iteration. This introduced a bug that shows up when batch size is greater than 16, is not a multiple of 16, and the remainder is not a multiple of the maximum columns being processed by the regular kernels (and so, never showed up in my testing using TG-128 and PP-512). This commit fixes the issue. * Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
-rw-r--r--examples/batched-bench/batched-bench.cpp2
1 files changed, 2 insertions, 0 deletions
diff --git a/examples/batched-bench/batched-bench.cpp b/examples/batched-bench/batched-bench.cpp
index 25e7c775..55f825fe 100644
--- a/examples/batched-bench/batched-bench.cpp
+++ b/examples/batched-bench/batched-bench.cpp
@@ -139,6 +139,8 @@ int main(int argc, char ** argv) {
const int n_ctx_req = is_pp_shared ? pp + pl*tg : pl*(pp + tg);
if (n_ctx_req > n_kv_max) {
+ printf("n_ctx_req = %d is greater than n_kv_max = %d for pp = %d, tg = %d, pl = %d\n",
+ n_ctx_req, n_kv_max, pp, tg, pl);
continue;
}