summaryrefslogtreecommitdiff
path: root/examples/server/tests/features/results.feature
diff options
context:
space:
mode:
Diffstat (limited to 'examples/server/tests/features/results.feature')
-rw-r--r--examples/server/tests/features/results.feature88
1 files changed, 56 insertions, 32 deletions
diff --git a/examples/server/tests/features/results.feature b/examples/server/tests/features/results.feature
index f17120f7..aa0b8d0c 100644
--- a/examples/server/tests/features/results.feature
+++ b/examples/server/tests/features/results.feature
@@ -7,44 +7,16 @@ Feature: Results
And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
And a model file test-model-00001-of-00003.gguf
And 128 as batch size
- And 256 KV cache size
+ And 1024 KV cache size
And 128 max tokens to predict
+ And continuous batching
- Scenario Outline: Multi users completion
+ Scenario Outline: consistent results with same seed
Given <n_slots> slots
- And continuous batching
Then the server is starting
Then the server is healthy
- Given 42 as seed
- And a prompt:
- """
- Write a very long story about AI.
- """
-
- Given 42 as seed
- And a prompt:
- """
- Write a very long story about AI.
- """
-
- Given 42 as seed
- And a prompt:
- """
- Write a very long story about AI.
- """
-
- Given 42 as seed
- And a prompt:
- """
- Write a very long story about AI.
- """
-
- Given 42 as seed
- And a prompt:
- """
- Write a very long story about AI.
- """
+ Given 4 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
Given concurrent completion requests
Then the server is busy
@@ -55,3 +27,55 @@ Feature: Results
| n_slots |
| 1 |
| 2 |
+
+ Scenario Outline: different results with different seed
+ Given <n_slots> slots
+ Then the server is starting
+ Then the server is healthy
+
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 43
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 44
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 45
+
+ Given concurrent completion requests
+ Then the server is busy
+ Then the server is idle
+ And all slots are idle
+ Then all predictions are different
+ Examples:
+ | n_slots |
+ | 1 |
+ | 2 |
+
+ Scenario Outline: consistent results with same seed and varying batch size
+ Given 4 slots
+ And <temp> temperature
+ # And 0 as draft
+ Then the server is starting
+ Then the server is healthy
+
+ Given 1 prompts "Write a very long story about AI." with seed 42
+ And concurrent completion requests
+ # Then the server is busy # Not all slots will be utilized.
+ Then the server is idle
+ And all slots are idle
+
+ Given <n_parallel> prompts "Write a very long story about AI." with seed 42
+ And concurrent completion requests
+ # Then the server is busy # Not all slots will be utilized.
+ Then the server is idle
+ And all slots are idle
+
+ Then all predictions are equal
+ Examples:
+ | n_parallel | temp |
+ | 1 | 0.0 |
+ | 2 | 0.0 |
+ | 4 | 0.0 |
+ | 1 | 1.0 |
+ # FIXME: These tests fail on master. The problem seems to be the unified KV cache.
+ # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
+ # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
+ # | 2 | 1.0 |
+ # | 4 | 1.0 |