diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2024-10-26 16:26:04 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-10-26 16:26:04 +0200 |
commit | bd309cb782ae8a5205dd741ccb97f6103f74888a (patch) | |
tree | ddbcade915d8158e893a5eebee40e8fd196353ab /ggml/src/ggml-cuda/template-instances/generate_cu_files.py | |
parent | 3805c84686f40fc4423d45308cab6adac2eafdd4 (diff) |
Bitnet CUDA improvements (#109)
* iq1_bn: improve CUDA TG
On RTX-3080 TG-128(Bitnet-1.58b-3B) goes from 318 t/s to 340 t/s.
I see I have on the front page 301 t/s, so pretty nice improvement
since then.
* iq2_bn(CUDA): quants are not 4-byte aligned
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/ggml-cuda/template-instances/generate_cu_files.py')
0 files changed, 0 insertions, 0 deletions