summaryrefslogtreecommitdiff
path: root/examples/server
diff options
context:
space:
mode:
authorJohannes Gäßler <johannesg@5d6.de>2024-05-18 11:10:47 +0200
committerGitHub <noreply@github.com>2024-05-18 11:10:47 +0200
commitcb42c294279bc4a0a4e926a7b5a5568049f12fa7 (patch)
tree1051477e1fd5b0fc6203f00d296fe9ba08ced05f /examples/server
parentd233b507cd19fcc2d8d8963ecc6a3eb7a33f2ecc (diff)
server: correct --threads documentation [no ci] (#7362)
Diffstat (limited to 'examples/server')
-rw-r--r--examples/server/README.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index 13cbdcd2..4f3262cd 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -18,8 +18,8 @@ The project is under active development, and we are [looking for feedback and co
**Command line options:**
- `-v`, `--verbose`: Enable verbose server output. When using the `/completion` endpoint, this includes the tokenized prompt, the full request and the full response.
-- `-t N`, `--threads N`: Set the number of threads to use during generation. Not used if model layers are offloaded to GPU. The server is using batching. This parameter is used only if one token is to be processed on CPU backend.
-- `-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation. Not used if model layers are offloaded to GPU.
+- `-t N`, `--threads N`: Set the number of threads to use by CPU layers during generation. Not used by model layers that are offloaded to GPU. This option has no effect when using the maximum number of GPU layers. Default: `std::thread::hardware_concurrency()` (number of CPU cores).
+- `-tb N, --threads-batch N`: Set the number of threads to use by CPU layers during batch and prompt processing (>= 32 tokens). This option has no effect if a GPU is available. Default: `--threads`.
- `--threads-http N`: Number of threads in the http server pool to process requests. Default: `max(std::thread::hardware_concurrency() - 1, --parallel N + 2)`
- `-m FNAME`, `--model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`).
- `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file. Default: unused