summaryrefslogtreecommitdiff
path: root/examples/server
diff options
context:
space:
mode:
authorJohannes Gäßler <johannesg@5d6.de>2024-06-07 11:15:49 +0200
committerGitHub <noreply@github.com>2024-06-07 11:15:49 +0200
commit7027b27d765db95d4ac6b569d976e387a8715881 (patch)
treede9313cdc2528ca8a9ac06a84caf360dca64ba63 /examples/server
parenta5cabd76491f07494c5b8267f921c73f5e2bbfb4 (diff)
server: update cache_prompt documentation [no ci] (#7745)
Diffstat (limited to 'examples/server')
-rw-r--r--examples/server/README.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index 0c3db8c8..ccbdcdbd 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -279,7 +279,7 @@ node index.js
`id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1`
- `cache_prompt`: Re-use previously cached prompt from the last request if possible. This may prevent re-caching the prompt from scratch. Default: `false`
+ `cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false`
`system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)