From efc72253f7987ed7bdc8bde9d9fa5c7cac2f6292 Mon Sep 17 00:00:00 2001 From: Jorge A <161275481+jorgealias@users.noreply.github.com> Date: Wed, 28 Feb 2024 01:39:15 -0700 Subject: server : add "/chat/completions" alias for "/v1/...` (#5722) * Add "/chat/completions" as alias for "/v1/chat/completions" * merge to upstream master * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov --- examples/server/tests/features/parallel.feature | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'examples/server/tests/features/parallel.feature') diff --git a/examples/server/tests/features/parallel.feature b/examples/server/tests/features/parallel.feature index c85f9de1..5f895cf9 100644 --- a/examples/server/tests/features/parallel.feature +++ b/examples/server/tests/features/parallel.feature @@ -54,6 +54,28 @@ Feature: Parallel | disabled | 128 | | enabled | 64 | + Scenario Outline: Multi users OAI completions compatibility no v1 + Given a system prompt You are a writer. + And a model tinyllama-2 + Given a prompt: + """ + Write a very long book. + """ + And a prompt: + """ + Write another a poem. + """ + And max tokens to predict + And streaming is + Given concurrent OAI completions requests no v1 + Then the server is busy + Then the server is idle + Then all prompts are predicted with tokens + Examples: + | streaming | n_predict | + | disabled | 128 | + | enabled | 64 | + Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size #3969 Given a prompt: """ -- cgit v1.2.3