diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-07-07 07:23:12 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-07-07 07:23:12 +0200 |
commit | 4c0b66026619cf51f45249181bf2cc1de8cd6884 (patch) | |
tree | 93c1b5474296180dda5eaf302ffa4ff615e4d62f /ggml/src/ggml-cuda/iqk_cuda_common.h | |
parent | 6f3a3ba7e249cd689cb1ab0376e6504fb6cd49e7 (diff) |
CUDA: small PP performance improvement for MoE models (#589)
* Trying to implement quantized fmoe - not working yet
* This works, but is slower than the non-working version
* quantize_mmq_q8_1_id
* Minor
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-cuda/iqk_cuda_common.h')
0 files changed, 0 insertions, 0 deletions