summaryrefslogtreecommitdiff
path: root/iqk_mul_mat.cpp
AgeCommit message (Expand)Author
2024-06-22Bitnet(2.25 bpw): NEONIwan Kawrakow
2024-06-22Bitnet: 2.25 bpw versionIwan Kawrakow
2024-06-22bitnet 2 bpw: NEON implementationIwan Kawrakow
2024-06-22Removed extra columnIwan Kawrakow
2024-06-22bitnet 2 bpw: AVX2 implementationIwan Kawrakow
2024-06-22iqk_mul_mat(bitnet): fix typoIwan Kawrakow
2024-06-22iqk_mul_mat(bitnet): slightly faster AVX2Iwan Kawrakow
2024-06-22iq1_bn: better NEON implementationIwan Kawrakow
2024-06-22iq1_bn(NEON): works now, but very slowIwan Kawrakow
2024-06-22iqk_mul_mat(iq1_bn): WIP NEON - don't see why it is not workingIwan Kawrakow
2024-06-22iqk_mul_mat(iq1_bn): WIP NEON (not working)Iwan Kawrakow
2024-06-22iqk_mul_mat: improve iq1_bn (bitnet) on vanilla AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: improve iq1_bn (bitnet) on AVX2Iwan Kawrakow
2024-06-22bitnet: scale is per row, not per tensorIwan Kawrakow
2024-06-22iqk_mul_mat: add iq1_bn (bitnet)Iwan Kawrakow
2024-06-22iqk_mul_mat: cleanupIwan Kawrakow
2024-06-22iqk_mul_mat: be independent of llamafile_sgemmIwan Kawrakow
2024-06-22iqk_mul_mat: be independent of llamafile_sgemm (WIP)Iwan Kawrakow
2024-06-22iqk_mul_mat: be able to handle any f16/f32 combination on AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: turn on AVX512Iwan Kawrakow
2024-06-22iqk_mul_mat: slightly better fp16 with 16 vector registersIwan Kawrakow
2024-06-22iqk_mul_mat: better fp16 for AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: fp16 for ArmIwan Kawrakow
2024-06-22iqk_mul_mat: slightly faster FANCY_SIMD dot productIwan Kawrakow
2024-06-22iqk_mul_mat: fix q8_0Iwan Kawrakow
2024-06-22iqk_mul_mat: use block_q8_1_x4 also for AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: use block_q8_0_x4 also for AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: delete unused stuffIwan Kawrakow
2024-06-22iqk_mul_mat: add q8_0Iwan Kawrakow
2024-06-22iqk_mul_mat: fp16 tweaksIwan Kawrakow
2024-06-22iqk_mul_mat: fp16 implementation cleanupIwan Kawrakow
2024-06-22iqk_mul_mat: fp16 implementation for AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: make it independent of sgemmIwan Kawrakow
2024-06-22iqk_mul_mat: minor improvementsIwan Kawrakow
2024-06-22iqk_mul_mat: no more templates in the IQ dequantizersIwan Kawrakow
2024-06-22iqk_mul_mat: remove template on one of the prepare() functionsIwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4Iwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4 (iq2_xxs)Iwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4 (iq2_xs)Iwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4 (iq3_s and iq2_m)Iwan Kawrakow
2024-06-22iqk_mul_mat: small improvement for iq3_sIwan Kawrakow
2024-06-22iqk_mul_mat: better AVX2 implementation for iq2_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: better AVX2 implementation for iq2_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq2_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq2_xsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq2_sIwan Kawrakow
2024-06-22Separate templates for TG and PP for i-quants on AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq3_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq3_sIwan Kawrakow
2024-06-22Cleanup - Arm i-quants should be good nowIwan Kawrakow