summaryrefslogtreecommitdiff
path: root/ggml/src/ggml-cuda/common.cuh
diff options
context:
space:
mode:
authorNexes the Elder <124105151+Nexesenex@users.noreply.github.com>2025-05-22 17:04:47 +0200
committerGitHub <noreply@github.com>2025-05-22 18:04:47 +0300
commitec4563221e22dda28fa073840a252e5956a87267 (patch)
treecb452337a59a39ca21f3b9dbab55811af4b1784b /ggml/src/ggml-cuda/common.cuh
parentb94cd3b632a78dfb46b18d52b84be66bcf26166a (diff)
Streamline a bit the quant strategies (#443)
* Streamline a bit the quant strategies No change over the existing patterns, except for the bump for attn_k and attn_v for the models with 4 and 6 experts (several frankensteins seen on HF, and which also use GQA). The rest is applying the existing patterns to the new IQ_K quants. Also, a Q8_0 for attn_q slipped into the MOEs 8 experts rule, I removed it, because that tensor is much bigger than attn_k or attn_v. * remove <=8 experts condition.
Diffstat (limited to 'ggml/src/ggml-cuda/common.cuh')
0 files changed, 0 insertions, 0 deletions