ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Nexes the Elder <124105151+Nexesenex@users.noreply.github.com>	2024-10-19 17:24:43 +0200
committer	GitHub <noreply@github.com>	2024-10-19 17:24:43 +0200
commit	a077f09bcb33a07c33408e7eb078529aa4fa6b4a (patch)
tree	00ee6794d80a098b294ee09fcb285498398138bd /ggml/include
parent	7b886ae3d876dfb569cdd02cca688066315a0667 (diff)

Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S (#96)

* attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S Pattern worth to be tested on more quants and on L3 8B. PPL 512 = -0.024 for 70b ; - 0.005 for 8b Size = - 640MiB for 70b ; - 64MiB for 8b 70b Q5_K_S now beats Q5_K_M by -0.012 ppl I suspect that it goes for L3 as well, which was quite insensitive to attn_q quantization. * indent

Diffstat (limited to 'ggml/include')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: