diff options
author | Georgi Gerganov <ggerganov@gmail.com> | 2024-05-20 15:10:03 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-05-20 22:10:03 +1000 |
commit | 3bc10cb485dd7efa4da6c64e73ad0c9e2bfe0821 (patch) | |
tree | 813e222908ea8854e51913fadc2d5ae1efbd97ef /examples/server | |
parent | 6bf9b66fa3f263ca2175dcb5f6d0a658581e1dfb (diff) |
server : fix temperature + disable some tests (#7409)
* server : fix temperature
* server : disable tests relying on parallel determinism
* ci : change server Debug -> RelWithDebInfo
Diffstat (limited to 'examples/server')
-rw-r--r-- | examples/server/tests/features/results.feature | 17 |
1 files changed, 8 insertions, 9 deletions
diff --git a/examples/server/tests/features/results.feature b/examples/server/tests/features/results.feature index 4ab8ad20..e8e1b541 100644 --- a/examples/server/tests/features/results.feature +++ b/examples/server/tests/features/results.feature @@ -13,7 +13,7 @@ Feature: Results Scenario Outline: consistent results with same seed Given <n_slots> slots - And 0.0 temperature + And 1.0 temperature Then the server is starting Then the server is healthy @@ -27,7 +27,8 @@ Feature: Results Examples: | n_slots | | 1 | - | 2 | + # FIXME: unified KV cache nondeterminism + # | 2 | Scenario Outline: different results with different seed Given <n_slots> slots @@ -73,14 +74,13 @@ Feature: Results Examples: | n_parallel | temp | | 1 | 0.0 | - | 2 | 0.0 | - | 4 | 0.0 | | 1 | 1.0 | - # FIXME: These tests fail on master. - # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism. + # FIXME: unified KV cache nondeterminism # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227 # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 # and https://github.com/ggerganov/llama.cpp/pull/7347 . + # | 2 | 0.0 | + # | 4 | 0.0 | # | 2 | 1.0 | # | 4 | 1.0 | @@ -108,12 +108,11 @@ Feature: Results Examples: | n_slots | n_kv | n_predict | n_parallel | | 4 | 1024 | 1 | 1 | - | 4 | 1024 | 1 | 4 | - # FIXME: These tests fail on master. - # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism. + # FIXME: unified KV cache nondeterminism # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227 # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 # and https://github.com/ggerganov/llama.cpp/pull/7347 . + # | 4 | 1024 | 1 | 4 | # | 4 | 1024 | 100 | 1 | # This test still fails even the above patches; the first token probabilities are already different. # | 4 | 1024 | 100 | 4 | |