ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2024-03-02 17:00:51 +0200
committer	GitHub <noreply@github.com>	2024-03-02 17:00:51 +0200
commit	bbde6eb2561153aabbdfac5001c690fe00cad639 (patch)
tree	7631fec8a47048dbe9bf733650568f8d3b13a2a1 /examples/infill/infill.cpp
parent	ef2cd694c4155fbf25bae61c5178c47eb3676dba (diff)

ggml : IQ3_S improvements (#5829)

* iq3_s: somewhat faster AVX2 dot product On Ryzen a 7950X TG-128 increases to 16 t/s from 15.5 t/s using 16 threads. For 8 threads it is 13.85 t/s vs 11.75 t/s. PP-512 increases to 28.5 t/s from 23.8 t/s. * iq3_s: somewhat faster ARM_NEON dot product Still dog slow - 10.7 t/s up from 9.9 t/s. * iq3_s: another small ARM_NEON improvement 10.7 -> 11.0 t/s. Using vmulq_s8 is faster than the xor - sub trick that works best on AVX2. * iq3_s: minor improvement on Metal 49.4 t/s -> 50.3 t/s * iq3_s: PPL improvement E.g., for a context of 4096 LLaMA-v2-7B goes to 5.1340 from 5.1653. * iq3_s: use new grid everywhere * Fix ARM_NEON --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'examples/infill/infill.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: