diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-01-12 13:19:14 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-01-12 13:19:14 +0200 |
commit | c19404bcdaa2a1f8900801d4865673e5f7a03f63 (patch) | |
tree | d558fa4a0ace9333730cb7afa98644660091fc13 /examples | |
parent | 7553989dd88749de028853f9c0ea39651aad92a3 (diff) |
MoE fix for R4 quants (#170)
* Fix bug in iqk_mul_mat
I recently added the possibility to have a matrix multiplication
kernel that processes 16 columns in the right matrix per iteration.
This introduced a bug that shows up when batch size is greater
than 16, is not a multiple of 16, and the remainder is not a multiple
of the maximum columns being processed by the regular kernels
(and so, never showed up in my testing using TG-128 and PP-512).
This commit fixes the issue.
* Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
-rw-r--r-- | examples/batched-bench/batched-bench.cpp | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/examples/batched-bench/batched-bench.cpp b/examples/batched-bench/batched-bench.cpp index 25e7c775..55f825fe 100644 --- a/examples/batched-bench/batched-bench.cpp +++ b/examples/batched-bench/batched-bench.cpp @@ -139,6 +139,8 @@ int main(int argc, char ** argv) { const int n_ctx_req = is_pp_shared ? pp + pl*tg : pl*(pp + tg); if (n_ctx_req > n_kv_max) { + printf("n_ctx_req = %d is greater than n_kv_max = %d for pp = %d, tg = %d, pl = %d\n", + n_ctx_req, n_kv_max, pp, tg, pl); continue; } |