llama : add ability to cancel model loading (#4462)

* llama : Add ability to cancel model load Updated llama_progress_callback so that if it returns false, the model loading is aborted. * llama : Add test for model load cancellation * Fix bool return in llama_model_load, remove std::ignore use * Update llama.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Fail test if model file is missing * Revert "Fail test if model file is missing" This reverts commit 32ebd525bf7e5a87ee8a3dbaab3d92ce79fbf23d. * Add test-model-load-cancel to Makefile * Revert "Revert "Fail test if model file is missing"" This reverts commit 2796953257ee5383fa7c8fe8fa8fc888c048fb0b. * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * ci : get ci/run.sh working with test-model-load-cancel * ci : restrict .github/workflows/build.yml ctest to -L main * update requirements.txt * Disable test-model-load-cancel in make * Remove venv before creation * Restructure requirements.txt Top-level now imports the specific additional requirements for each python file. Using `pip install -r requirements.txt` will fail if versions become mismatched in the per-file requirements. * Make per-python-script requirements work alone This doesn't break the main requirements.txt. * Add comment * Add convert-persimmon-to-gguf.py to new requirements.txt scheme * Add check-requirements.sh script and GitHub workflow * Remove shellcheck installation step from workflow * Add nocleanup special arg * Fix merge see: https://github.com/ggerganov/llama.cpp/pull/4462#discussion_r1434593573 * reset to upstream/master * Redo changes for cancelling model load --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
author: crasm <crasm@git.vczf.us> 2023-12-22 01:19:36 -0500
committer: GitHub <noreply@github.com> 2023-12-22 08:19:36 +0200
commit: c7e9701f86564088350209d2f9d71c96ea00527f (patch)
tree: 5d0df4199dd2a4b515b94e619d5feec5c6760a1d /llama.h
parent: afefa319f1f59b002dfa0d1ef407a2c74bd9770b (diff)
1 files changed, 4 insertions, 2 deletions
diff --git a/llama.h b/llama.h
index 0be4b133..af76bae2 100644
--- a/llama.h
+++ b/llama.h
@@ -127,7 +127,7 @@ extern "C" {
         bool sorted;
     } llama_token_data_array;
 
-    typedef void (*llama_progress_callback)(float progress, void *ctx);
+    typedef bool (*llama_progress_callback)(float progress, void *ctx);
 
     // Input data for llama_decode
     // A llama_batch object can contain input about one or many sequences
@@ -180,7 +180,9 @@ extern "C" {
         int32_t main_gpu;     // the GPU that is used for scratch and small tensors
         const float * tensor_split; // how to split layers across multiple GPUs (size: LLAMA_MAX_DEVICES)
 
-        // called with a progress value between 0 and 1, pass NULL to disable
+        // Called with a progress value between 0.0 and 1.0. Pass NULL to disable.
+        // If the provided progress_callback returns true, model loading continues.
+        // If it returns false, model loading is immediately aborted.
         llama_progress_callback progress_callback;
 
         // context pointer passed to the progress callback
author	crasm <crasm@git.vczf.us>	2023-12-22 01:19:36 -0500
committer	GitHub <noreply@github.com>	2023-12-22 08:19:36 +0200
commit	c7e9701f86564088350209d2f9d71c96ea00527f (patch)
tree	5d0df4199dd2a4b515b94e619d5feec5c6760a1d /llama.h
parent	afefa319f1f59b002dfa0d1ef407a2c74bd9770b (diff)