summaryrefslogtreecommitdiff
path: root/examples/server/bench
AgeCommit message (Collapse)Author
2024-03-27server: continuous performance monitoring and PR comment (#6283)Pierrick Hymbert
* server: bench: init * server: bench: reduce list of GPU nodes * server: bench: fix graph, fix output artifact * ci: bench: add mermaid in case of image cannot be uploaded * ci: bench: more resilient, more metrics * ci: bench: trigger build * ci: bench: fix duration * ci: bench: fix typo * ci: bench: fix mermaid values, markdown generated * typo on the step name Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * ci: bench: trailing spaces * ci: bench: move images in a details section * ci: bench: reduce bullet point size --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-03-09server: benchmark: chat/completions scenario and other llm servers ↵Pierrick Hymbert
comparison (#5941) * server: bench: Init a bench scenario with K6 See #5827 * server: bench: EOL EOF * server: bench: PR feedback and improved k6 script configuration * server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading server: bench: add max_tokens from SERVER_BENCH_MAX_TOKENS server: bench: increase truncated rate to 80% before failing * server: bench: fix doc * server: bench: change gauge custom metrics to trend * server: bench: change gauge custom metrics to trend server: bench: add trend custom metrics for total tokens per second average * server: bench: doc add an option to debug http request * server: bench: filter dataset too short and too long sequences * server: bench: allow to filter out conversation in the dataset based on env variable * server: bench: fix assistant message sent instead of user message * server: bench: fix assistant message sent instead of user message * server : add defrag thold parameter * server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>