diff options
author | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-19 19:51:39 +0300 |
---|---|---|
committer | Iwan Kawrakow <iwan.kawrakow@gmail.com> | 2024-06-22 12:02:52 +0300 |
commit | e73ae1f6d31074f774741a592382ec62a9de6dbf (patch) | |
tree | e9fc2d42af4a5894703d715af5d3b1d48edd0e0a /llama.cpp | |
parent | 7f968d51b4eb6f403bb7dbc1a5bbf98491ff293b (diff) |
bitnet(scale in a separate tensor): mul -> scale on CUDA
On CUDA we do not have access to the tensor data until we
hit the kernel. That's why this hack.
In any case, iq2_bn goes back up to 228 t/s, which is close
to the 234 t/s we have without the extra scale operation.
PP is 9400 t/s, down from 9600 t/s, but better than the 9200 t/s
we get without making the mul -> scale replacement.
Diffstat (limited to 'llama.cpp')
0 files changed, 0 insertions, 0 deletions