summaryrefslogtreecommitdiff
path: root/examples/llama-bench
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-04-03 07:15:49 +0200
committerGitHub <noreply@github.com>2025-04-03 07:15:49 +0200
commit07dbc1aa06d761634419759431ebb215baf698bb (patch)
tree6da38e23c3a954ddfa5ea26a9babb0b3ec334541 /examples/llama-bench
parent6d405d1fd1bddfe31fcbf00c9c8652a0a9166887 (diff)
Metal: much faster MoE prompt processing (#307)
* MoE improvements on Metal This version beats mainline, there are things I don't understand: * Mianline has effectively gone to GEMV for MUL_MAT_ID. We can do the same, but we are 30% slower. Why? * Using actual GEMM, we beat mainline with ubtach size of 128. But then performance degrades. Why? * Some cleanup * Much better --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/llama-bench')
0 files changed, 0 insertions, 0 deletions