diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-03-25 07:47:10 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-03-25 07:47:10 +0100 |
commit | 98a264a2ea21761322847ac562f58d986ef6c512 (patch) | |
tree | 53b92f2d9cca9e0c39d5d693b32863827936c116 /examples | |
parent | f9307d79071c2a1e8efe10ecb1e1304bf77c021a (diff) |
CUDA: better MoE implementation (#283)
* Make fused MoE reproducible
As a bonus, peak performance at pp2048 with u_batch = 2048 is
~8% better.
* Slightly better
* Also do it for non-fused mul_mat_id
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions