ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Markus Tavenrath <mtavenrath@users.noreply.github.com>	2024-06-17 16:10:15 +0200
committer	GitHub <noreply@github.com>	2024-06-17 16:10:15 +0200
commit	6a2f0b3474d479bda4ac2ee7cfd5dcdcf0be1f79 (patch)
tree	093504f65b9e2ff2b1f359e9c9980eb0a17159c2 /ggml-backend.c
parent	21be9cab94e0b5b53cb6edeeebf8c8c799baad03 (diff)

Implement non-mapped async IO for CUDA on Windows. (#7896)

* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. * Free resources except for backend. * Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * Fix editorconfig and unused variable * Fix issues with Windows build --------- Co-authored-by: slaren <slarengh@gmail.com>

Diffstat (limited to 'ggml-backend.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: