ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2025-03-27 10:48:52 +0100
committer	GitHub <noreply@github.com>	2025-03-27 10:48:52 +0100
commit	23b0addb34d8942baedc6f968460560392feadd3 (patch)
tree	9738a4d2a96860231afbe86f8e278632a31c4faf /ggml/src/iqk/iqk_quantize.cpp
parent	d0b52076da0261f291b01f1ffa44884c8b2cdb1c (diff)

Make sure tensor row size is multiple of block size also when quantizing with --pure (#294)

* WIP - not working * q8_0 without bells and wistles works * It works for q8_0 * Use bf16 instead of f16,int16 * q4_0_r8 * q5_0_r4 * q6_0_r4 * Also q4_1 and q5_1 * Add check if selected type is possible with --pure I often want to quantize with --pure to see quantization performance without quantization mixes. But for models where there qre tensors with row sizes that are not multiple of 256, this results in a crash for k- and i-quants. Hence, lets add a check if the quant selected via --pure is applicable, and change it if not. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'ggml/src/iqk/iqk_quantize.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: