summaryrefslogtreecommitdiff
path: root/src/llama.cpp
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2024-10-26 16:26:04 +0200
committerGitHub <noreply@github.com>2024-10-26 16:26:04 +0200
commitbd309cb782ae8a5205dd741ccb97f6103f74888a (patch)
treeddbcade915d8158e893a5eebee40e8fd196353ab /src/llama.cpp
parent3805c84686f40fc4423d45308cab6adac2eafdd4 (diff)
Bitnet CUDA improvements (#109)
* iq1_bn: improve CUDA TG On RTX-3080 TG-128(Bitnet-1.58b-3B) goes from 318 t/s to 340 t/s. I see I have on the front page 301 t/s, so pretty nice improvement since then. * iq2_bn(CUDA): quants are not 4-byte aligned --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'src/llama.cpp')
0 files changed, 0 insertions, 0 deletions