diff options
author | Pierrick Hymbert <pierrick.hymbert@gmail.com> | 2024-04-06 05:40:47 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-04-06 05:40:47 +0200 |
commit | 75cd4c77292034ecec587ecb401366f57338f7c0 (patch) | |
tree | de137718780505410bc75ce219f4bc164961c4fd /examples/server/bench/README.md | |
parent | a8bd14d55717754a1f48313a846a2b16fa998ad2 (diff) |
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495)
* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate
Diffstat (limited to 'examples/server/bench/README.md')
-rw-r--r-- | examples/server/bench/README.md | 42 |
1 files changed, 37 insertions, 5 deletions
diff --git a/examples/server/bench/README.md b/examples/server/bench/README.md index a53ad64d..23a3ec97 100644 --- a/examples/server/bench/README.md +++ b/examples/server/bench/README.md @@ -2,13 +2,15 @@ Benchmark is using [k6](https://k6.io/). -##### Install k6 +##### Install k6 and sse extension -Follow instruction from: https://k6.io/docs/get-started/installation/ +SSE is not supported by default in k6, you have to build k6 with the [xk6-sse](https://github.com/phymbert/xk6-sse) extension. -Example for ubuntu: +Example: ```shell -snap install k6 +go install go.k6.io/xk6/cmd/xk6@latest +xk6 build master \ +--with github.com/phymbert/xk6-sse ``` #### Download a dataset @@ -46,7 +48,7 @@ server --host localhost --port 8080 \ For 500 chat completions request with 8 concurrent users during maximum 10 minutes, run: ```shell -k6 run script.js --duration 10m --iterations 500 --vus 8 +./k6 run script.js --duration 10m --iterations 500 --vus 8 ``` The benchmark values can be overridden with: @@ -86,3 +88,33 @@ K6 metrics might be compared against [server metrics](../README.md), with: ```shell curl http://localhost:8080/metrics ``` + +### Using the CI python script +The `bench.py` script does several steps: +- start the server +- define good variable for k6 +- run k6 script +- extract metrics from prometheus + +It aims to be used in the CI, but you can run it manually: + +```shell +LLAMA_SERVER_BIN_PATH=../../../cmake-build-release/bin/server python bench.py \ + --runner-label local \ + --name local \ + --branch `git rev-parse --abbrev-ref HEAD` \ + --commit `git rev-parse HEAD` \ + --scenario script.js \ + --duration 5m \ + --hf-repo ggml-org/models \ + --hf-file phi-2/ggml-model-q4_0.gguf \ + --model-path-prefix models \ + --parallel 4 \ + -ngl 33 \ + --batch-size 2048 \ + --ubatch-size 256 \ + --ctx-size 4096 \ + --n-prompts 200 \ + --max-prompt-tokens 256 \ + --max-tokens 256 +``` |