server: init functional tests (#5566)

* server: tests: init scenarios - health and slots endpoints - completion endpoint - OAI compatible chat completion requests w/ and without streaming - completion multi users scenario - multi users scenario on OAI compatible endpoint with streaming - multi users with total number of tokens to predict exceeds the KV Cache size - server wrong usage scenario, like in Infinite loop of "context shift" #3969 - slots shifting - continuous batching - embeddings endpoint - multi users embedding endpoint: Segmentation fault #5655 - OpenAI-compatible embeddings API - tokenize endpoint - CORS and api key scenario * server: CI GitHub workflow --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Pierrick Hymbert <pierrick.hymbert@gmail.com> 2024-02-24 12:28:55 +0100
committer: GitHub <noreply@github.com> 2024-02-24 12:28:55 +0100
commit: 525213d2f5da1eaf4b922b6b792cb52b2c613368 (patch)
tree: 8400e8a97d231b13a2df0c9d8b7c8fa945d24d5e /examples/server/tests/features/server.feature
parent: fd43d66f46ee3b5345fb8a74a252d86ccd34a409 (diff)
1 files changed, 69 insertions, 0 deletions
diff --git a/examples/server/tests/features/server.feature b/examples/server/tests/features/server.feature
new file mode 100644
index 00000000..fedcfe5a
--- /dev/null
+++ b/examples/server/tests/features/server.feature
@@ -0,0 +1,69 @@
+@llama.cpp
+Feature: llama.cpp server
+
+  Background: Server startup
+    Given a server listening on localhost:8080
+    And   a model file stories260K.gguf
+    And   a model alias tinyllama-2
+    And   42 as server seed
+      # KV Cache corresponds to the total amount of tokens
+      # that can be stored across all independent sequences: #4130
+      # see --ctx-size and #5568
+    And   32 KV cache size
+    And   1 slots
+    And   embeddings extraction
+    And   32 server max tokens to predict
+    Then  the server is starting
+    Then  the server is healthy
+
+  Scenario: Health
+    Then the server is ready
+    And  all slots are idle
+
+  Scenario Outline: Completion
+    Given a prompt <prompt>
+    And   <n_predict> max tokens to predict
+    And   a completion request with no api error
+    Then  <n_predicted> tokens are predicted matching <re_content>
+
+    Examples: Prompts
+      | prompt                           | n_predict | re_content                   | n_predicted |
+      | I believe the meaning of life is | 8         | read                         | 8           |
+      | Write a joke about AI            | 64        | (park<or>friends<or>scared)+ | 32          |
+
+  Scenario Outline: OAI Compatibility
+    Given a model <model>
+    And   a system prompt <system_prompt>
+    And   a user prompt <user_prompt>
+    And   <max_tokens> max tokens to predict
+    And   streaming is <enable_streaming>
+    Given an OAI compatible chat completions request with no api error
+    Then  <n_predicted> tokens are predicted matching <re_content>
+
+    Examples: Prompts
+      | model        | system_prompt               | user_prompt                          | max_tokens | re_content                 | n_predicted | enable_streaming |
+      | llama-2      | Book                        | What is the best book                | 8          | (Mom<or>what)+             | 8           | disabled         |
+      | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64         | (thanks<or>happy<or>bird)+ | 32          | enabled          |
+
+  Scenario: Embedding
+    When embeddings are computed for:
+    """
+    What is the capital of Bulgaria ?
+    """
+    Then embeddings are generated
+
+  Scenario: OAI Embeddings compatibility
+    Given a model tinyllama-2
+    When an OAI compatible embeddings computation request for:
+    """
+    What is the capital of Spain ?
+    """
+    Then embeddings are generated
+
+
+  Scenario: Tokenize / Detokenize
+    When tokenizing:
+    """
+    What is the capital of France ?
+    """
+    Then tokens can be detokenize
author	Pierrick Hymbert <pierrick.hymbert@gmail.com>	2024-02-24 12:28:55 +0100
committer	GitHub <noreply@github.com>	2024-02-24 12:28:55 +0100
commit	525213d2f5da1eaf4b922b6b792cb52b2c613368 (patch)
tree	8400e8a97d231b13a2df0c9d8b7c8fa945d24d5e /examples/server/tests/features/server.feature
parent	fd43d66f46ee3b5345fb8a74a252d86ccd34a409 (diff)