summaryrefslogtreecommitdiff
path: root/examples/server
diff options
context:
space:
mode:
authorThái Hoàng Tâm <75922889+RoyalHeart@users.noreply.github.com>2023-11-05 23:15:27 +0700
committerGitHub <noreply@github.com>2023-11-05 18:15:27 +0200
commitbb60fd0bf6bb270744d86dd45b3a95af01b7de45 (patch)
tree83e6f76b1c993ce4bfeb893dc638b45d909342af /examples/server
parent132d25b8a62ea084447e0014a0112c1b371fb3f8 (diff)
server : fix typo for --alias shortcut from -m to -a (#3958)
Diffstat (limited to 'examples/server')
-rw-r--r--examples/server/README.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index 71500773..089ebe2d 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -7,7 +7,7 @@ Command line options:
- `--threads N`, `-t N`: Set the number of threads to use during generation.
- `-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation.
- `-m FNAME`, `--model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`).
-- `-m ALIAS`, `--alias ALIAS`: Set an alias for the model. The alias will be returned in API responses.
+- `-a ALIAS`, `--alias ALIAS`: Set an alias for the model. The alias will be returned in API responses.
- `-c N`, `--ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference. The size may differ in other models, for example, baichuan models were build with a context of 4096.
- `-ngl N`, `--n-gpu-layers N`: When compiled with appropriate support (currently CLBlast or cuBLAS), this option allows offloading some layers to the GPU for computation. Generally results in increased performance.
- `-mg i, --main-gpu i`: When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used. Requires cuBLAS.