summaryrefslogtreecommitdiff
path: root/examples/quantize-stats/quantize-stats.cpp
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2023-09-11 09:30:11 +0200
committerGitHub <noreply@github.com>2023-09-11 10:30:11 +0300
commitf31b6f4e2d6def3c0bd7c75f75c0c1e8698e0589 (patch)
tree15c450ae8af732c4a0ce48452dc66fc2bfcd3fae /examples/quantize-stats/quantize-stats.cpp
parent6eeb4d90839bac1e6085e5544654ab5c319ad09a (diff)
metal : PP speedup (#3084)
* Minor speed gains for all quantization types * metal: faster kernel_scale via float4 * Various other speedups for "small" kernels * metal: faster soft_max vial float4 * metal: faster diagonal infinity Although, to me it looks like one should simply fuse scale + diagnonal infinity + soft_max on the KQtensor. * Another faster f16 x f32 matrix multiply kernel * Reverting the diag infinity change It does work for PP, but somehow it fails for TG. Need to look more into it. * metal: add back faster diagonal infinity This time more carefully * metal : minor (readibility) --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/quantize-stats/quantize-stats.cpp')
0 files changed, 0 insertions, 0 deletions