diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-05-26 19:34:54 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-05-26 19:34:54 +0300 |
commit | 14292913260af89f37a6b856ef73bf88bda25129 (patch) | |
tree | 67aed02032ea97819a279e28957821f337221c9e /examples/server/server.cpp | |
parent | 24c010b3916b5f1bb9d712d610d1fe9308ef7df4 (diff) |
CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461)
* CUDA: iq4_k_r4 dequantize
* CUDA: iq4_k_r4 GEMV
~10% slower than iq4_k.
* CUDA: slightly faster iq4_k_r4 GEMV
* CUDA: slightly faster iq4_k_r4 GEMV
We are now within 3% of iq4_k
* CUDA: iq5_k_r4 dequantize
* CUDA: iq5_k_r4 GEMV
~3% slower than iq5_k.
* CUDA: iq3_k_r4 dequantize
* CUDA: iq3_k_r4 GEMV
* CUDA: slightly faster iq3_k_r4 GEMV
* CUDA: iq2_k_r4 GEMV
* CUDA: faster iq2_k_r4 GEMV
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples/server/server.cpp')
0 files changed, 0 insertions, 0 deletions