Age | Commit message (Collapse) | Author |
|
usage in stream OAI response (#6495)
* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate
|
|
* ci: bench: change trigger path to not spawn on each PR
* ci: bench: add more file type for phi-2: q8_0 and f16.
- do not show the comment by default
* ci: bench: add seed parameter in k6 script
* ci: bench: artefact name perf job
* Add iteration in the commit status, reduce again the autocomment
* ci: bench: add per slot metric in the commit status
* Fix trailing spaces
|
|
* server: bench: init
* server: bench: reduce list of GPU nodes
* server: bench: fix graph, fix output artifact
* ci: bench: add mermaid in case of image cannot be uploaded
* ci: bench: more resilient, more metrics
* ci: bench: trigger build
* ci: bench: fix duration
* ci: bench: fix typo
* ci: bench: fix mermaid values, markdown generated
* typo on the step name
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* ci: bench: trailing spaces
* ci: bench: move images in a details section
* ci: bench: reduce bullet point size
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
|