summaryrefslogtreecommitdiff
path: root/llama.cpp
diff options
context:
space:
mode:
authorJohannes Gäßler <johannesg@5d6.de>2023-12-20 15:41:22 +0100
committerGitHub <noreply@github.com>2023-12-20 15:41:22 +0100
commit799fc2268989482054944c902874cca76337580f (patch)
treef535df08f2059a709f8f5b8014d532f1aa086a2d /llama.cpp
parent328b83de23b33240e28f4e74900d1d06726f5eb1 (diff)
CUDA: Faster Mixtral prompt processing (#4538)
* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
Diffstat (limited to 'llama.cpp')
0 files changed, 0 insertions, 0 deletions