ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2024-03-11 16:53:15 +0100
committer	GitHub <noreply@github.com>	2024-03-11 17:53:15 +0200
commit	44ca159faf4fbe1a7ace13a962845ba7cdfd95ec (patch)
tree	d983c8b474f139f5973d5712886018595dfa2d5d /examples/llm.vim
parent	05b06210c954491cf0f12034b0a62bd4d69ce78b (diff)

1.5 bit: we can do even better (#5999)

* iq1_s: we can do even better Spent one of the 4 scale bits on a signs of a 0.125 shift. I.e., quants are now -1 + delta, delta, 1 + delta, where delta is +/- 0.125. CUDA works, same performance as before. PPL(LLaMA-v2-7B) is now 11.85! * iq1_s: make scalar and AVX2 work with the new version * iq1_s: make Neon work with new version. ~10% drop in performance, so will need some more work. * iq1_s: make Metal work with new version * iq1_s: very slightly faster dequantize on Metal * iq1_s: fix dequantize on the CPU --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'examples/llm.vim')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: