ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2025-06-18 07:29:33 +0300
committer	GitHub <noreply@github.com>	2025-06-18 07:29:33 +0300
commit	dc96820ddb45c639ea4e149e4bbfcb0b67fbcc2b (patch)
tree	2ac3011164d541f5899db1afdad375cc59bfc142 /ggml/src/ggml-quants.c
parent	8b3002bba2ea64b1de9ca2ff87207d8c37b0f08e (diff)

Much faster CPU prompt processing (part 2) (#533)

* iq4_ks 203 t/s -> 357 t/s. iq4_ks_r4 is 242 t/s. * iq4_k 175 t/s -> 353 t/s. iq4_k_r4 is 208 t/s. PPL is actually lower! * iq5_ks 180 t/s -> 359 t/s. iq5_ks_r4 is 210 t/s. PPL is actually lower - 7.4160 vs 7.4494 for LlaMA-3.1-8B-Instruct * iq5_k - accuracy loss is too big * iq5_k - there was a bug with the shifts ...and that's why PPL was so high. It is also high on main. This fixes it. * iq6_k 148 t/s -> 350 t/s. There is no iq6_k_r4 PPL is actually lower because we have a bug in the existing implementation! * iq3_k 169 t/s -> 363 t/s. iq3_k_r4 is at 200 t/s. * iq2_k 190 t/s -> 364 t/s. iq2_k_r4 is at 232 t/s. * iq2_ks 200 t/s -> 367 t/s. There is no iq2_ks_r4. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'ggml/src/ggml-quants.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: