diff options
Diffstat (limited to 'examples/server/tests/features/server.feature')
-rw-r--r-- | examples/server/tests/features/server.feature | 69 |
1 files changed, 69 insertions, 0 deletions
diff --git a/examples/server/tests/features/server.feature b/examples/server/tests/features/server.feature new file mode 100644 index 00000000..fedcfe5a --- /dev/null +++ b/examples/server/tests/features/server.feature @@ -0,0 +1,69 @@ +@llama.cpp +Feature: llama.cpp server + + Background: Server startup + Given a server listening on localhost:8080 + And a model file stories260K.gguf + And a model alias tinyllama-2 + And 42 as server seed + # KV Cache corresponds to the total amount of tokens + # that can be stored across all independent sequences: #4130 + # see --ctx-size and #5568 + And 32 KV cache size + And 1 slots + And embeddings extraction + And 32 server max tokens to predict + Then the server is starting + Then the server is healthy + + Scenario: Health + Then the server is ready + And all slots are idle + + Scenario Outline: Completion + Given a prompt <prompt> + And <n_predict> max tokens to predict + And a completion request with no api error + Then <n_predicted> tokens are predicted matching <re_content> + + Examples: Prompts + | prompt | n_predict | re_content | n_predicted | + | I believe the meaning of life is | 8 | read | 8 | + | Write a joke about AI | 64 | (park<or>friends<or>scared)+ | 32 | + + Scenario Outline: OAI Compatibility + Given a model <model> + And a system prompt <system_prompt> + And a user prompt <user_prompt> + And <max_tokens> max tokens to predict + And streaming is <enable_streaming> + Given an OAI compatible chat completions request with no api error + Then <n_predicted> tokens are predicted matching <re_content> + + Examples: Prompts + | model | system_prompt | user_prompt | max_tokens | re_content | n_predicted | enable_streaming | + | llama-2 | Book | What is the best book | 8 | (Mom<or>what)+ | 8 | disabled | + | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64 | (thanks<or>happy<or>bird)+ | 32 | enabled | + + Scenario: Embedding + When embeddings are computed for: + """ + What is the capital of Bulgaria ? + """ + Then embeddings are generated + + Scenario: OAI Embeddings compatibility + Given a model tinyllama-2 + When an OAI compatible embeddings computation request for: + """ + What is the capital of Spain ? + """ + Then embeddings are generated + + + Scenario: Tokenize / Detokenize + When tokenizing: + """ + What is the capital of France ? + """ + Then tokens can be detokenize |