ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Nexes the Elder <124105151+Nexesenex@users.noreply.github.com>	2025-05-22 17:04:47 +0200
committer	GitHub <noreply@github.com>	2025-05-22 18:04:47 +0300
commit	ec4563221e22dda28fa073840a252e5956a87267 (patch)
tree	cb452337a59a39ca21f3b9dbab55811af4b1784b /ggml/src/ggml-cuda/common.cuh
parent	b94cd3b632a78dfb46b18d52b84be66bcf26166a (diff)

Streamline a bit the quant strategies (#443)

* Streamline a bit the quant strategies No change over the existing patterns, except for the bump for attn_k and attn_v for the models with 4 and 6 experts (several frankensteins seen on HF, and which also use GQA). The rest is applying the existing patterns to the new IQ_K quants. Also, a Q8_0 for attn_q slipped into the MOEs 8 experts rule, I removed it, because that tensor is much bigger than attn_k or attn_v. * remove <=8 experts condition.

Diffstat (limited to 'ggml/src/ggml-cuda/common.cuh')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: