summaryrefslogtreecommitdiff
path: root/examples/server/bench/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'examples/server/bench/README.md')
-rw-r--r--examples/server/bench/README.md42
1 files changed, 37 insertions, 5 deletions
diff --git a/examples/server/bench/README.md b/examples/server/bench/README.md
index a53ad64d..23a3ec97 100644
--- a/examples/server/bench/README.md
+++ b/examples/server/bench/README.md
@@ -2,13 +2,15 @@
Benchmark is using [k6](https://k6.io/).
-##### Install k6
+##### Install k6 and sse extension
-Follow instruction from: https://k6.io/docs/get-started/installation/
+SSE is not supported by default in k6, you have to build k6 with the [xk6-sse](https://github.com/phymbert/xk6-sse) extension.
-Example for ubuntu:
+Example:
```shell
-snap install k6
+go install go.k6.io/xk6/cmd/xk6@latest
+xk6 build master \
+--with github.com/phymbert/xk6-sse
```
#### Download a dataset
@@ -46,7 +48,7 @@ server --host localhost --port 8080 \
For 500 chat completions request with 8 concurrent users during maximum 10 minutes, run:
```shell
-k6 run script.js --duration 10m --iterations 500 --vus 8
+./k6 run script.js --duration 10m --iterations 500 --vus 8
```
The benchmark values can be overridden with:
@@ -86,3 +88,33 @@ K6 metrics might be compared against [server metrics](../README.md), with:
```shell
curl http://localhost:8080/metrics
```
+
+### Using the CI python script
+The `bench.py` script does several steps:
+- start the server
+- define good variable for k6
+- run k6 script
+- extract metrics from prometheus
+
+It aims to be used in the CI, but you can run it manually:
+
+```shell
+LLAMA_SERVER_BIN_PATH=../../../cmake-build-release/bin/server python bench.py \
+ --runner-label local \
+ --name local \
+ --branch `git rev-parse --abbrev-ref HEAD` \
+ --commit `git rev-parse HEAD` \
+ --scenario script.js \
+ --duration 5m \
+ --hf-repo ggml-org/models \
+ --hf-file phi-2/ggml-model-q4_0.gguf \
+ --model-path-prefix models \
+ --parallel 4 \
+ -ngl 33 \
+ --batch-size 2048 \
+ --ubatch-size 256 \
+ --ctx-size 4096 \
+ --n-prompts 200 \
+ --max-prompt-tokens 256 \
+ --max-tokens 256
+```