ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Pierrick Hymbert <pierrick.hymbert@gmail.com>	2024-03-09 23:41:49 +0100
committer	GitHub <noreply@github.com>	2024-03-09 23:41:49 +0100
commit	621e86b331f8b0e71f79fd82a4ae1cd54c3e4396 (patch)
tree	e667aa693df722aafbb5452054de261839d0dac1 /common/grammar-parser.cpp
parent	77d1ac7e00bf049b9f2bba1b5a310a78318c49c4 (diff)

server: benchmark: chat/completions scenario and other llm servers comparison (#5941)

* server: bench: Init a bench scenario with K6 See #5827 * server: bench: EOL EOF * server: bench: PR feedback and improved k6 script configuration * server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading server: bench: add max_tokens from SERVER_BENCH_MAX_TOKENS server: bench: increase truncated rate to 80% before failing * server: bench: fix doc * server: bench: change gauge custom metrics to trend * server: bench: change gauge custom metrics to trend server: bench: add trend custom metrics for total tokens per second average * server: bench: doc add an option to debug http request * server: bench: filter dataset too short and too long sequences * server: bench: allow to filter out conversation in the dataset based on env variable * server: bench: fix assistant message sent instead of user message * server: bench: fix assistant message sent instead of user message * server : add defrag thold parameter * server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Diffstat (limited to 'common/grammar-parser.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: