diff options
author | Georgi Gerganov <ggerganov@gmail.com> | 2023-12-01 10:51:24 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-01 10:51:24 +0200 |
commit | ef47ec18da469423c276b683dd9b5741cee7023e (patch) | |
tree | ec3b4780dbe8f629425de499b298e8eadfd1aa4d /gguf-py | |
parent | 1d144112c0fbbb4ecc07dbcf4f05a380148bd6de (diff) |
ggml : add ggml_soft_max_ext (#4256)
* metal : implement soft_max_ext
* cuda : implement soft_max_ext
* ggml : implement soft_max_ext (CPU)
* batched-bench : print threads
ggml-ci
* metal : simplify soft_max encoding
ggml-ci
* cuda : use 512 threads for soft_max instead of 32
* ggml : update soft max cpu
* cuda : do warp-based block reduce
* cuda : increase max block size to 1024
* cuda : fix warp reduction initialization of shared mem
* metal : warp-based reduction for soft max kernel
* metal : warp-based reduce for rms_norm
* metal : simplify soft max kernel
ggml-ci
* alloc : fix build with debug
Diffstat (limited to 'gguf-py')
0 files changed, 0 insertions, 0 deletions