summaryrefslogtreecommitdiff
path: root/examples
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-03-25 07:47:10 +0100
committerGitHub <noreply@github.com>2025-03-25 07:47:10 +0100
commit98a264a2ea21761322847ac562f58d986ef6c512 (patch)
tree53b92f2d9cca9e0c39d5d693b32863827936c116 /examples
parentf9307d79071c2a1e8efe10ecb1e1304bf77c021a (diff)
CUDA: better MoE implementation (#283)
* Make fused MoE reproducible As a bonus, peak performance at pp2048 with u_batch = 2048 is ~8% better. * Slightly better * Also do it for non-fused mul_mat_id --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions