ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2025-07-07 07:23:12 +0200
committer	GitHub <noreply@github.com>	2025-07-07 07:23:12 +0200
commit	4c0b66026619cf51f45249181bf2cc1de8cd6884 (patch)
tree	93c1b5474296180dda5eaf302ffa4ff615e4d62f /ggml/src/ggml-cuda/iqk_cuda_common.h
parent	6f3a3ba7e249cd689cb1ab0376e6504fb6cd49e7 (diff)

CUDA: small PP performance improvement for MoE models (#589)

* Trying to implement quantized fmoe - not working yet * This works, but is slower than the non-working version * quantize_mmq_q8_1_id * Minor --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'ggml/src/ggml-cuda/iqk_cuda_common.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: