diff options
| author | Kawrakow <iwankawrakow@gmail.com> | 2025-05-04 12:45:00 +0300 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-05-04 12:45:00 +0300 |
| commit | f7c9a0f036951fecab32e056df954ebc54f8688f (patch) | |
| tree | 277a7c5ee63fda3841488e38a1dda9d2a43e0094 /examples/llava/convert_image_encoder_to_gguf.py | |
| parent | 13281282986fb6783d0d7d64b3610bfb7085e749 (diff) | |
CUDA: MMQ for IQ4_KS (#374)
* WIP
* WIP: still getting illegal memory access
* CUDA: MMQ for iq4_ks now works
~25% faster than dequantize+cuBLAS, ~10% slower than Q4_0 MMQ.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/llava/convert_image_encoder_to_gguf.py')
0 files changed, 0 insertions, 0 deletions
