summaryrefslogtreecommitdiff
path: root/ggml/src/ggml-cuda/iqk_cuda_common.h
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-07-07 07:23:12 +0200
committerGitHub <noreply@github.com>2025-07-07 07:23:12 +0200
commit4c0b66026619cf51f45249181bf2cc1de8cd6884 (patch)
tree93c1b5474296180dda5eaf302ffa4ff615e4d62f /ggml/src/ggml-cuda/iqk_cuda_common.h
parent6f3a3ba7e249cd689cb1ab0376e6504fb6cd49e7 (diff)
CUDA: small PP performance improvement for MoE models (#589)
* Trying to implement quantized fmoe - not working yet * This works, but is slower than the non-working version * quantize_mmq_q8_1_id * Minor --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-cuda/iqk_cuda_common.h')
0 files changed, 0 insertions, 0 deletions