summaryrefslogtreecommitdiff
path: root/gguf-py
diff options
context:
space:
mode:
authorGeorgi Gerganov <ggerganov@gmail.com>2023-12-01 10:51:24 +0200
committerGitHub <noreply@github.com>2023-12-01 10:51:24 +0200
commitef47ec18da469423c276b683dd9b5741cee7023e (patch)
treeec3b4780dbe8f629425de499b298e8eadfd1aa4d /gguf-py
parent1d144112c0fbbb4ecc07dbcf4f05a380148bd6de (diff)
ggml : add ggml_soft_max_ext (#4256)
* metal : implement soft_max_ext * cuda : implement soft_max_ext * ggml : implement soft_max_ext (CPU) * batched-bench : print threads ggml-ci * metal : simplify soft_max encoding ggml-ci * cuda : use 512 threads for soft_max instead of 32 * ggml : update soft max cpu * cuda : do warp-based block reduce * cuda : increase max block size to 1024 * cuda : fix warp reduction initialization of shared mem * metal : warp-based reduction for soft max kernel * metal : warp-based reduce for rms_norm * metal : simplify soft max kernel ggml-ci * alloc : fix build with debug
Diffstat (limited to 'gguf-py')
0 files changed, 0 insertions, 0 deletions