summaryrefslogtreecommitdiff
path: root/src/llama-sampling.cpp
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2024-09-28 17:41:21 +0300
committerGitHub <noreply@github.com>2024-09-28 17:41:21 +0300
commit7abcc6cc0b0a48b780bb0877e4720c46a7e3c255 (patch)
tree0fbe1e7c3d02462ba3972f9be88a39ab9a3c0c30 /src/llama-sampling.cpp
parent737514fd814d944f8ce965620293a16e5e8a285d (diff)
CUDA non-contiguous RoPE (#66)
In this way we can avoid the Q, K, V copies being made after multiplication with the QKV tensor in, e.g., Phi-3.5-mini. This results in a 6-7% speedup of PP-512(Phi-3.5-mini) on CUDA (RTX-4080) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'src/llama-sampling.cpp')
0 files changed, 0 insertions, 0 deletions