ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2024-09-28 17:41:21 +0300
committer	GitHub <noreply@github.com>	2024-09-28 17:41:21 +0300
commit	7abcc6cc0b0a48b780bb0877e4720c46a7e3c255 (patch)
tree	0fbe1e7c3d02462ba3972f9be88a39ab9a3c0c30 /src/llama-sampling.cpp
parent	737514fd814d944f8ce965620293a16e5e8a285d (diff)

CUDA non-contiguous RoPE (#66)

In this way we can avoid the Q, K, V copies being made after multiplication with the QKV tensor in, e.g., Phi-3.5-mini. This results in a 6-7% speedup of PP-512(Phi-3.5-mini) on CUDA (RTX-4080) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'src/llama-sampling.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: