summaryrefslogtreecommitdiff
path: root/examples/server/README.md
diff options
context:
space:
mode:
authorPierrick Hymbert <pierrick.hymbert@gmail.com>2024-02-18 17:30:09 +0100
committerGitHub <noreply@github.com>2024-02-18 18:30:09 +0200
commit36376abe05a12a8cb3af548a4af9b8d0e2e69597 (patch)
tree02a4b73a978c23db9e979b4cc0f57ff445ede597 /examples/server/README.md
parent66c1968f7a2e895675425e875b6589f1233a1b52 (diff)
server : --n-predict option document and cap to max value (#5549)
* server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option
Diffstat (limited to 'examples/server/README.md')
-rw-r--r--examples/server/README.md1
1 files changed, 1 insertions, 0 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index 24936874..fe5cd8d5 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -39,6 +39,7 @@ see https://github.com/ggerganov/llama.cpp/issues/1437
- `--mmproj MMPROJ_FILE`: Path to a multimodal projector file for LLaVA.
- `--grp-attn-n`: Set the group attention factor to extend context size through self-extend(default: 1=disabled), used together with group attention width `--grp-attn-w`
- `--grp-attn-w`: Set the group attention width to extend context size through self-extend(default: 512), used together with group attention factor `--grp-attn-n`
+- `-n, --n-predict`: Set the maximum tokens to predict (default: -1)
## Build