llama : save and restore kv cache for single seq id (#6341)

* llama : save and restore kv cache for single seq id * remove trailing whitespace * respond error in case there's no space in the kv cache * add kv seq save restore to test case * add --slot-save-path arg to enable save restore and restrict save location * Returning 0 for some cases, instead of asserting. * cleanup error cases * rename sequence state functions * rename state get set functions * add previous function names back in with DEPRECATED notice * update doc * adjust endpoints to preferred style * fix restoring zero cell count * handle seq rm return value * unused param * keep in the size check * fix return types * add server test case for slot save restore * cleanup * add cake * cleanup style * add special * removing a whole sequence never fails * move sequence state file functionality from server to llama to match session api and add version tags * catch exceptions on save as well * error log messages * check types for stricter restore * update server doc * readme : update API changes date * strict filename validation * move include, reject bom as well * also reject empty filename * reject whitespace and trailing dot --------- Co-authored-by: Martin Evans <martindevans@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Jan Boon <jan.boon@kaetemi.be> 2024-04-08 20:43:30 +0800
committer: GitHub <noreply@github.com> 2024-04-08 15:43:30 +0300
commit: beea6e1b16e783a0886e78dec01002a8c00db24d (patch)
tree: a7365b1e93145b78a8b4be72df959239aa8c0f0d /README.md
parent: 87fb5b4234d4b9c56ac94cf7aa229c8fd7defdb0 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/README.md b/README.md
index 6d996112..a4897fc3 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
 
 ### Recent API changes
 
+- [2024 Apr 4] State and session file functions reorganized under `llama_state_*` https://github.com/ggerganov/llama.cpp/pull/6341
 - [2024 Mar 26] Logits and embeddings API updated for compactness https://github.com/ggerganov/llama.cpp/pull/6122
 - [2024 Mar 13] Add `llama_synchronize()` + `llama_context_params.n_ubatch` https://github.com/ggerganov/llama.cpp/pull/6017
 - [2024 Mar 8] `llama_kv_cache_seq_rm()` returns a `bool` instead of `void`, and new `llama_n_seq_max()` returns the upper limit of acceptable `seq_id` in batches (relevant when dealing with multiple sequences) https://github.com/ggerganov/llama.cpp/pull/5328
author	Jan Boon <jan.boon@kaetemi.be>	2024-04-08 20:43:30 +0800
committer	GitHub <noreply@github.com>	2024-04-08 15:43:30 +0300
commit	beea6e1b16e783a0886e78dec01002a8c00db24d (patch)
tree	a7365b1e93145b78a8b4be72df959239aa8c0f0d /README.md
parent	87fb5b4234d4b9c56ac94cf7aa229c8fd7defdb0 (diff)