server: correct --threads documentation [no ci] (#7362)

author: Johannes Gäßler <johannesg@5d6.de> 2024-05-18 11:10:47 +0200
committer: GitHub <noreply@github.com> 2024-05-18 11:10:47 +0200
commit: cb42c294279bc4a0a4e926a7b5a5568049f12fa7 (patch)
tree: 1051477e1fd5b0fc6203f00d296fe9ba08ced05f /examples/server
parent: d233b507cd19fcc2d8d8963ecc6a3eb7a33f2ecc (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index 13cbdcd2..4f3262cd 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -18,8 +18,8 @@ The project is under active development, and we are [looking for feedback and co
 **Command line options:**
 
 - `-v`, `--verbose`: Enable verbose server output. When using the `/completion` endpoint, this includes the tokenized prompt, the full request and the full response.
-- `-t N`, `--threads N`: Set the number of threads to use during generation. Not used if model layers are offloaded to GPU. The server is using batching. This parameter is used only if one token is to be processed on CPU backend.
-- `-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation. Not used if model layers are offloaded to GPU.
+- `-t N`, `--threads N`: Set the number of threads to use by CPU layers during generation. Not used by model layers that are offloaded to GPU. This option has no effect when using the maximum number of GPU layers. Default: `std::thread::hardware_concurrency()` (number of CPU cores).
+- `-tb N, --threads-batch N`: Set the number of threads to use by CPU layers during batch and prompt processing (>= 32 tokens). This option has no effect if a GPU is available. Default: `--threads`.
 - `--threads-http N`: Number of threads in the http server pool to process requests. Default: `max(std::thread::hardware_concurrency() - 1, --parallel N + 2)`
 - `-m FNAME`, `--model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`).
 - `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file. Default: unused
author	Johannes Gäßler <johannesg@5d6.de>	2024-05-18 11:10:47 +0200
committer	GitHub <noreply@github.com>	2024-05-18 11:10:47 +0200
commit	cb42c294279bc4a0a4e926a7b5a5568049f12fa7 (patch)
tree	1051477e1fd5b0fc6203f00d296fe9ba08ced05f /examples/server
parent	d233b507cd19fcc2d8d8963ecc6a3eb7a33f2ecc (diff)