summaryrefslogtreecommitdiff
path: root/ggml-quants.c
diff options
context:
space:
mode:
authorIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-08 13:47:02 +0300
committerIwan Kawrakow <iwan.kawrakow@gmail.com>2024-06-22 12:02:50 +0300
commit8a80a31ddd5f3239ab1da6deff1efcdf4f43d1d9 (patch)
tree78ccaf60d6f17dbead0658ac1057006920d5c324 /ggml-quants.c
parent81409a02f3c10a74ea23167f1782a951d026ab49 (diff)
iqk_mul_mat: fix q8_0
I was happily using _mm256_packs_epi32() to pack the q8_0 x q8_0 dot products back to int16_t, and getting useful results. But theoretically this can overflow, so it is better to use _mm256_unpacklo_ and _mm256_unpackhi_ to combine the 4 dot products using int32_t additions. This is (almost) as fast, unlike _mm256_hadd_epi32(), which seems excessively slow on the Ryzen-7950X.
Diffstat (limited to 'ggml-quants.c')
0 files changed, 0 insertions, 0 deletions