ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kawrakow <iwankawrakow@gmail.com>	2024-10-26 16:26:04 +0200
committer	GitHub <noreply@github.com>	2024-10-26 16:26:04 +0200
commit	bd309cb782ae8a5205dd741ccb97f6103f74888a (patch)
tree	ddbcade915d8158e893a5eebee40e8fd196353ab /src/llama.cpp
parent	3805c84686f40fc4423d45308cab6adac2eafdd4 (diff)

Bitnet CUDA improvements (#109)

* iq1_bn: improve CUDA TG On RTX-3080 TG-128(Bitnet-1.58b-3B) goes from 318 t/s to 340 t/s. I see I have on the front page 301 t/s, so pretty nice improvement since then. * iq2_bn(CUDA): quants are not 4-byte aligned --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'src/llama.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: