diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-09-04 07:24:04 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-09-04 07:24:04 +0300 |
commit | f17d0d72f565bf24d6eb8aa67d6618cdc143961d (patch) | |
tree | 69374b9512d52a47ba410a6982db0e2db1f1f236 /examples | |
parent | 8c94dcd43350b6bde8f5618f7e0e9f0b400a2ac6 (diff) |
Performance improvements for legacy quants on ARM_NEON (#37)
* WIP: trying to improve legacy quants
* WIP: trying to improve legacy quants
With this commit PP-512 for LlaMA-3.1-8B goes from
72 t/s to 87.2 t/s for q4_0, and from 61.5 t/s to 73.9 t/s
for q4_1, so 20+% improvement for both.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions