diff options
author | Johannes Gäßler <johannesg@5d6.de> | 2023-12-20 15:41:22 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-20 15:41:22 +0100 |
commit | 799fc2268989482054944c902874cca76337580f (patch) | |
tree | f535df08f2059a709f8f5b8014d532f1aa086a2d /llama.cpp | |
parent | 328b83de23b33240e28f4e74900d1d06726f5eb1 (diff) |
CUDA: Faster Mixtral prompt processing (#4538)
* CUDA: make MoE tensors contiguous for batch size>1
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
Diffstat (limited to 'llama.cpp')
0 files changed, 0 insertions, 0 deletions