diff options
author | Markus Tavenrath <mtavenrath@users.noreply.github.com> | 2024-06-17 16:10:15 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-06-17 16:10:15 +0200 |
commit | 6a2f0b3474d479bda4ac2ee7cfd5dcdcf0be1f79 (patch) | |
tree | 093504f65b9e2ff2b1f359e9c9980eb0a17159c2 /ggml-backend.c | |
parent | 21be9cab94e0b5b53cb6edeeebf8c8c799baad03 (diff) |
Implement non-mapped async IO for CUDA on Windows. (#7896)
* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive.
* Free resources except for backend.
* Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA.
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* Fix editorconfig and unused variable
* Fix issues with Windows build
---------
Co-authored-by: slaren <slarengh@gmail.com>
Diffstat (limited to 'ggml-backend.c')
0 files changed, 0 insertions, 0 deletions