summaryrefslogtreecommitdiff
path: root/examples/server/tests/features/results.feature
diff options
context:
space:
mode:
Diffstat (limited to 'examples/server/tests/features/results.feature')
-rw-r--r--examples/server/tests/features/results.feature52
1 files changed, 44 insertions, 8 deletions
diff --git a/examples/server/tests/features/results.feature b/examples/server/tests/features/results.feature
index aa0b8d0c..5deb278c 100644
--- a/examples/server/tests/features/results.feature
+++ b/examples/server/tests/features/results.feature
@@ -70,12 +70,48 @@ Feature: Results
Then all predictions are equal
Examples:
| n_parallel | temp |
- | 1 | 0.0 |
- | 2 | 0.0 |
- | 4 | 0.0 |
- | 1 | 1.0 |
- # FIXME: These tests fail on master. The problem seems to be the unified KV cache.
+ | 1 | 0.0 |
+ | 2 | 0.0 |
+ | 4 | 0.0 |
+ | 1 | 1.0 |
+ # FIXME: These tests fail on master.
+ # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
- # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
- # | 2 | 1.0 |
- # | 4 | 1.0 |
+ # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
+ # and https://github.com/ggerganov/llama.cpp/pull/7347 .
+ # | 2 | 1.0 |
+ # | 4 | 1.0 |
+
+ Scenario Outline: consistent token probs with same seed and prompt
+ Given <n_slots> slots
+ And <n_kv> KV cache size
+ And 1.0 temperature
+ And <n_predict> max tokens to predict
+ Then the server is starting
+ Then the server is healthy
+
+ Given 1 prompts "The meaning of life is" with seed 42
+ And concurrent completion requests
+ # Then the server is busy # Not all slots will be utilized.
+ Then the server is idle
+ And all slots are idle
+
+ Given <n_parallel> prompts "The meaning of life is" with seed 42
+ And concurrent completion requests
+ # Then the server is busy # Not all slots will be utilized.
+ Then the server is idle
+ And all slots are idle
+
+ Then all token probabilities are equal
+ Examples:
+ | n_slots | n_kv | n_predict | n_parallel |
+ | 4 | 1024 | 1 | 1 |
+ | 4 | 1024 | 1 | 4 |
+ # FIXME: These tests fail on master.
+ # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
+ # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
+ # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
+ # and https://github.com/ggerganov/llama.cpp/pull/7347 .
+ # | 4 | 1024 | 100 | 1 |
+ # This test still fails even the above patches; the first token probabilities are already different.
+ # | 4 | 1024 | 100 | 4 |