diff options
author | Nexes the Elder <124105151+Nexesenex@users.noreply.github.com> | 2025-05-22 17:04:47 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-05-22 18:04:47 +0300 |
commit | ec4563221e22dda28fa073840a252e5956a87267 (patch) | |
tree | cb452337a59a39ca21f3b9dbab55811af4b1784b /examples | |
parent | b94cd3b632a78dfb46b18d52b84be66bcf26166a (diff) |
Streamline a bit the quant strategies (#443)
* Streamline a bit the quant strategies
No change over the existing patterns, except for the bump for attn_k and attn_v for the models with 4 and 6 experts (several frankensteins seen on HF, and which also use GQA).
The rest is applying the existing patterns to the new IQ_K quants.
Also, a Q8_0 for attn_q slipped into the MOEs 8 experts rule, I removed it, because that tensor is much bigger than attn_k or attn_v.
* remove <=8 experts condition.
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions