summaryrefslogtreecommitdiff
path: root/examples
diff options
context:
space:
mode:
Diffstat (limited to 'examples')
-rw-r--r--examples/llama-bench/README.md34
-rw-r--r--examples/sycl/win-run-llama2.bat2
2 files changed, 22 insertions, 14 deletions
diff --git a/examples/llama-bench/README.md b/examples/llama-bench/README.md
index d02824bf..374e40a7 100644
--- a/examples/llama-bench/README.md
+++ b/examples/llama-bench/README.md
@@ -23,19 +23,23 @@ usage: ./llama-bench [options]
options:
-h, --help
- -m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
- -p, --n-prompt <n> (default: 512)
- -n, --n-gen <n> (default: 128)
- -b, --batch-size <n> (default: 512)
- --memory-f32 <0|1> (default: 0)
- -t, --threads <n> (default: 16)
- -ngl N, --n-gpu-layers <n> (default: 99)
- -mg i, --main-gpu <i> (default: 0)
- -mmq, --mul-mat-q <0|1> (default: 1)
- -ts, --tensor_split <ts0/ts1/..>
- -r, --repetitions <n> (default: 5)
- -o, --output <csv|json|md|sql> (default: md)
- -v, --verbose (default: 0)
+ -m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
+ -p, --n-prompt <n> (default: 512)
+ -n, --n-gen <n> (default: 128)
+ -b, --batch-size <n> (default: 512)
+ -ctk <t>, --cache-type-k <t> (default: f16)
+ -ctv <t>, --cache-type-v <t> (default: f16)
+ -t, --threads <n> (default: 112)
+ -ngl, --n-gpu-layers <n> (default: 99)
+ -sm, --split-mode <none|layer|row> (default: layer)
+ -mg, --main-gpu <i> (default: 0)
+ -nkvo, --no-kv-offload <0|1> (default: 0)
+ -mmp, --mmap <0|1> (default: 1)
+ -mmq, --mul-mat-q <0|1> (default: 1)
+ -ts, --tensor_split <ts0/ts1/..> (default: 0)
+ -r, --repetitions <n> (default: 5)
+ -o, --output <csv|json|md|sql> (default: md)
+ -v, --verbose (default: 0)
Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
```
@@ -51,6 +55,10 @@ Each test is repeated the number of times given by `-r`, and the results are ave
For a description of the other options, see the [main example](../main/README.md).
+Note:
+
+- When using SYCL backend, there would be hang issue in some cases. Please set `--mmp 0`.
+
## Examples
### Text generation with different models
diff --git a/examples/sycl/win-run-llama2.bat b/examples/sycl/win-run-llama2.bat
index 28d93554..cf621c67 100644
--- a/examples/sycl/win-run-llama2.bat
+++ b/examples/sycl/win-run-llama2.bat
@@ -2,7 +2,7 @@
:: Copyright (C) 2024 Intel Corporation
:: SPDX-License-Identifier: MIT
-INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
+set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force