summaryrefslogtreecommitdiff
path: root/ggml/include
diff options
context:
space:
mode:
authorNexes the Elder <124105151+Nexesenex@users.noreply.github.com>2024-10-19 17:24:43 +0200
committerGitHub <noreply@github.com>2024-10-19 17:24:43 +0200
commita077f09bcb33a07c33408e7eb078529aa4fa6b4a (patch)
tree00ee6794d80a098b294ee09fcb285498398138bd /ggml/include
parent7b886ae3d876dfb569cdd02cca688066315a0667 (diff)
Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S (#96)
* attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S Pattern worth to be tested on more quants and on L3 8B. PPL 512 = -0.024 for 70b ; - 0.005 for 8b Size = - 640MiB for 70b ; - 64MiB for 8b 70b Q5_K_S now beats Q5_K_M by -0.012 ppl I suspect that it goes for L3 as well, which was quite insensitive to attn_q quantization. * indent
Diffstat (limited to 'ggml/include')
0 files changed, 0 insertions, 0 deletions