diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2024-09-28 17:41:21 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-09-28 17:41:21 +0300 |
commit | 7abcc6cc0b0a48b780bb0877e4720c46a7e3c255 (patch) | |
tree | 0fbe1e7c3d02462ba3972f9be88a39ab9a3c0c30 /src/llama-sampling.cpp | |
parent | 737514fd814d944f8ce965620293a16e5e8a285d (diff) |
CUDA non-contiguous RoPE (#66)
In this way we can avoid the Q, K, V copies being made
after multiplication with the QKV tensor in, e.g., Phi-3.5-mini.
This results in a 6-7% speedup of PP-512(Phi-3.5-mini)
on CUDA (RTX-4080)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'src/llama-sampling.cpp')
0 files changed, 0 insertions, 0 deletions