summaryrefslogtreecommitdiff
path: root/ggml-backend.c
diff options
context:
space:
mode:
authorMarkus Tavenrath <mtavenrath@users.noreply.github.com>2024-06-17 16:10:15 +0200
committerGitHub <noreply@github.com>2024-06-17 16:10:15 +0200
commit6a2f0b3474d479bda4ac2ee7cfd5dcdcf0be1f79 (patch)
tree093504f65b9e2ff2b1f359e9c9980eb0a17159c2 /ggml-backend.c
parent21be9cab94e0b5b53cb6edeeebf8c8c799baad03 (diff)
Implement non-mapped async IO for CUDA on Windows. (#7896)
* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. * Free resources except for backend. * Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * Fix editorconfig and unused variable * Fix issues with Windows build --------- Co-authored-by: slaren <slarengh@gmail.com>
Diffstat (limited to 'ggml-backend.c')
0 files changed, 0 insertions, 0 deletions