diff options
author | crasm <crasm@git.vczf.us> | 2023-12-22 01:19:36 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-22 08:19:36 +0200 |
commit | c7e9701f86564088350209d2f9d71c96ea00527f (patch) | |
tree | 5d0df4199dd2a4b515b94e619d5feec5c6760a1d /llama.h | |
parent | afefa319f1f59b002dfa0d1ef407a2c74bd9770b (diff) |
llama : add ability to cancel model loading (#4462)
* llama : Add ability to cancel model load
Updated llama_progress_callback so that if it returns false, the model
loading is aborted.
* llama : Add test for model load cancellation
* Fix bool return in llama_model_load, remove std::ignore use
* Update llama.cpp
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* Fail test if model file is missing
* Revert "Fail test if model file is missing"
This reverts commit 32ebd525bf7e5a87ee8a3dbaab3d92ce79fbf23d.
* Add test-model-load-cancel to Makefile
* Revert "Revert "Fail test if model file is missing""
This reverts commit 2796953257ee5383fa7c8fe8fa8fc888c048fb0b.
* Simplify .gitignore for tests, clang-tidy fixes
* Label all ctest tests
* ci : ctest uses -L main
* Attempt at writing ctest_with_model
* ci : get ci/run.sh working with test-model-load-cancel
* ci : restrict .github/workflows/build.yml ctest to -L main
* update requirements.txt
* Disable test-model-load-cancel in make
* Remove venv before creation
* Restructure requirements.txt
Top-level now imports the specific additional requirements for each
python file. Using `pip install -r requirements.txt` will fail if
versions become mismatched in the per-file requirements.
* Make per-python-script requirements work alone
This doesn't break the main requirements.txt.
* Add comment
* Add convert-persimmon-to-gguf.py to new requirements.txt scheme
* Add check-requirements.sh script and GitHub workflow
* Remove shellcheck installation step from workflow
* Add nocleanup special arg
* Fix merge
see: https://github.com/ggerganov/llama.cpp/pull/4462#discussion_r1434593573
* reset to upstream/master
* Redo changes for cancelling model load
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Diffstat (limited to 'llama.h')
-rw-r--r-- | llama.h | 6 |
1 files changed, 4 insertions, 2 deletions
@@ -127,7 +127,7 @@ extern "C" { bool sorted; } llama_token_data_array; - typedef void (*llama_progress_callback)(float progress, void *ctx); + typedef bool (*llama_progress_callback)(float progress, void *ctx); // Input data for llama_decode // A llama_batch object can contain input about one or many sequences @@ -180,7 +180,9 @@ extern "C" { int32_t main_gpu; // the GPU that is used for scratch and small tensors const float * tensor_split; // how to split layers across multiple GPUs (size: LLAMA_MAX_DEVICES) - // called with a progress value between 0 and 1, pass NULL to disable + // Called with a progress value between 0.0 and 1.0. Pass NULL to disable. + // If the provided progress_callback returns true, model loading continues. + // If it returns false, model loading is immediately aborted. llama_progress_callback progress_callback; // context pointer passed to the progress callback |