summaryrefslogtreecommitdiff
path: root/examples/server
diff options
context:
space:
mode:
Diffstat (limited to 'examples/server')
-rw-r--r--examples/server/tests/features/results.feature17
1 files changed, 8 insertions, 9 deletions
diff --git a/examples/server/tests/features/results.feature b/examples/server/tests/features/results.feature
index 4ab8ad20..e8e1b541 100644
--- a/examples/server/tests/features/results.feature
+++ b/examples/server/tests/features/results.feature
@@ -13,7 +13,7 @@ Feature: Results
Scenario Outline: consistent results with same seed
Given <n_slots> slots
- And 0.0 temperature
+ And 1.0 temperature
Then the server is starting
Then the server is healthy
@@ -27,7 +27,8 @@ Feature: Results
Examples:
| n_slots |
| 1 |
- | 2 |
+ # FIXME: unified KV cache nondeterminism
+ # | 2 |
Scenario Outline: different results with different seed
Given <n_slots> slots
@@ -73,14 +74,13 @@ Feature: Results
Examples:
| n_parallel | temp |
| 1 | 0.0 |
- | 2 | 0.0 |
- | 4 | 0.0 |
| 1 | 1.0 |
- # FIXME: These tests fail on master.
- # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
+ # FIXME: unified KV cache nondeterminism
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
+ # | 2 | 0.0 |
+ # | 4 | 0.0 |
# | 2 | 1.0 |
# | 4 | 1.0 |
@@ -108,12 +108,11 @@ Feature: Results
Examples:
| n_slots | n_kv | n_predict | n_parallel |
| 4 | 1024 | 1 | 1 |
- | 4 | 1024 | 1 | 4 |
- # FIXME: These tests fail on master.
- # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
+ # FIXME: unified KV cache nondeterminism
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
+ # | 4 | 1024 | 1 | 4 |
# | 4 | 1024 | 100 | 1 |
# This test still fails even the above patches; the first token probabilities are already different.
# | 4 | 1024 | 100 | 4 |