server : update readme with undocumented options (#7013)

author: Kyle Mistele <kyle@mistele.com> 2024-05-07 13:44:29 -0500
committer: GitHub <noreply@github.com> 2024-05-07 21:44:29 +0300
commit: 260b7c65296fba0568eeb1ff05244ea0be206b54 (patch)
tree: bc299b115674cfbb5f0f256db2fbfa9fe375f60a
parent: 53d6c52e227dedef347b21e28febcfb9caeecdad (diff)
1 files changed, 12 insertions, 0 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index b96a4444..bf371364 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -62,6 +62,18 @@ page cache before using this. See https://github.com/ggerganov/llama.cpp/issues/
 - `--chat-template JINJA_TEMPLATE`: Set custom jinja chat template. This parameter accepts a string, not a file name.  Default: template taken from model's metadata. We only support [some pre-defined templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
 - `--log-disable`: Output logs to stdout only, not to `llama.log`. Default: enabled
 - `--log-format FORMAT`: Define the log output to FORMAT: json or text Default: `json`
+- `--rope-scaling` : RoPE scaling method. Defaults to linear unless otherwise specified by the model. Options are `none`, `linear`, `yarn`
+- `--rope-freq-base N` : RoPE frequency base (default: loaded from model)
+- `--rope-freq-scale N`: RoPE frequency scaling factor, expands context by a factor of 1/N (e.g. 0.25)
+- `--yarn-ext-factor N` : YaRN: extrapolation mix factor (Default: 1.0, 0.0 = full interpolation)
+- `--yarn-attn-factor N` : YaRN: scale sqrt(t) or attention magnitude (default: 1.0)
+- `--yarn-beta-slow N`: YaRN: High correction dim or alpha (default: 1.0)
+- `--yarn-beta-fast N`: YaRN: low correction dim or beta (default: 32.0)
+- `--pooling` : Pooling type for embeddings, use model default if unspecified. Options are `none`, `mean`, `cls`
+- `-dt N`, `--defrag-thold N`: KV cache defragmentation threshold (default: -1.0, < 0 = disabled)
+- `-fa`, `--flash-attn` : enable flash attention (default: disabled).
+- `-ctk TYPE`, `--cache-type-k TYPE` : KV cache data type for K (default: `f16`, options `f32`, `f16`, `q8_0`, `q4_0`, `q4_1`, `iq4_nl`, `q5_0`, or `q5_1`)
+- `-ctv TYPE`, `--cache-type-v TYPE` : KV cache type for V (default `f16`, see `-ctk` for options)
 
 **If compiled with `LLAMA_SERVER_SSL=ON`**
 - `--ssl-key-file FNAME`: path to file a PEM-encoded SSL private key
author	Kyle Mistele <kyle@mistele.com>	2024-05-07 13:44:29 -0500
committer	GitHub <noreply@github.com>	2024-05-07 21:44:29 +0300
commit	260b7c65296fba0568eeb1ff05244ea0be206b54 (patch)
tree	bc299b115674cfbb5f0f256db2fbfa9fe375f60a
parent	53d6c52e227dedef347b21e28febcfb9caeecdad (diff)