ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2025-04-03 07:15:49 +0200
committer	GitHub <noreply@github.com>	2025-04-03 07:15:49 +0200
commit	07dbc1aa06d761634419759431ebb215baf698bb (patch)
tree	6da38e23c3a954ddfa5ea26a9babb0b3ec334541 /examples/llama-bench
parent	6d405d1fd1bddfe31fcbf00c9c8652a0a9166887 (diff)

Metal: much faster MoE prompt processing (#307)

* MoE improvements on Metal This version beats mainline, there are things I don't understand: * Mianline has effectively gone to GEMV for MUL_MAT_ID. We can do the same, but we are 30% slower. Why? * Using actual GEMM, we beat mainline with ubtach size of 128. But then performance degrades. Why? * Some cleanup * Much better --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'examples/llama-bench')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: