summaryrefslogtreecommitdiff
path: root/common/common.h
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-25 18:19:11 +0300
committerIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-25 18:19:11 +0300
commit753dbaeeb0be5fb3d0d4337d7854dcf4f3a30fe1 (patch)
treeafedc73d7d8b8032f5c2057aec8bdff95e6601df /common/common.h
parent8b436a84c53de4c5a8eaf9be72cdd82324da2eeb (diff)
bitnet: remove iq1_bn lookup table storing +/- signs
The AVX2 implementation was the only one left using it, so I decided to see if we can get a performant implementation using the 0,1,2 lookup table. Turns out we can, and it is even slightly faster than the sign based table. We now get PP-512 = 275 t/s and TG-128 = 57.7 t/s with 16 threads on the Ryzen-7950X. With only one lookup table left for iq1_bn, I renamed it to iq1bn_grid_u16.
Diffstat (limited to 'common/common.h')
0 files changed, 0 insertions, 0 deletions