summaryrefslogtreecommitdiff
path: root/examples/main/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'examples/main/README.md')
-rw-r--r--examples/main/README.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/examples/main/README.md b/examples/main/README.md
index 35f87bcd..7c03f92c 100644
--- a/examples/main/README.md
+++ b/examples/main/README.md
@@ -270,9 +270,9 @@ These options help improve the performance and memory usage of the LLaMA models.
- `-b N, --batch_size N`: Set the batch size for prompt processing (default: 512). This large batch size benefits users who have BLAS installed and enabled it during the build. If you don't have BLAS enabled ("BLAS=0"), you can use a smaller number, such as 8, to see the prompt progress as it's evaluated in some situations.
-### Session Caching
+### Prompt Caching
-- `--session FNAME`: Specify a file to load/save the session, which caches the model state after the initial prompt. This can significantly speed up the startup time when you're using longer prompts. The session file is created during the first run and is reused in subsequent runs. If you change your prompt such that 75% or less of the session is reusable, the existing session file will be overwritten with a new, updated version to maintain optimal performance.
+- `--prompt-cache FNAME`: Specify a file to cache the model state after the initial prompt. This can significantly speed up the startup time when you're using longer prompts. The file is created during the first run and is reused and updated in subsequent runs.
### Quantization