diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2023-09-11 09:30:11 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-09-11 10:30:11 +0300 |
commit | f31b6f4e2d6def3c0bd7c75f75c0c1e8698e0589 (patch) | |
tree | 15c450ae8af732c4a0ce48452dc66fc2bfcd3fae /examples/server/json.hpp | |
parent | 6eeb4d90839bac1e6085e5544654ab5c319ad09a (diff) |
metal : PP speedup (#3084)
* Minor speed gains for all quantization types
* metal: faster kernel_scale via float4
* Various other speedups for "small" kernels
* metal: faster soft_max vial float4
* metal: faster diagonal infinity
Although, to me it looks like one should simply
fuse scale + diagnonal infinity + soft_max on the
KQtensor.
* Another faster f16 x f32 matrix multiply kernel
* Reverting the diag infinity change
It does work for PP, but somehow it fails for TG.
Need to look more into it.
* metal: add back faster diagonal infinity
This time more carefully
* metal : minor (readibility)
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/server/json.hpp')
0 files changed, 0 insertions, 0 deletions