summaryrefslogtreecommitdiff
path: root/examples/server/tests/features/slotsave.feature
diff options
context:
space:
mode:
Diffstat (limited to 'examples/server/tests/features/slotsave.feature')
-rw-r--r--examples/server/tests/features/slotsave.feature58
1 files changed, 58 insertions, 0 deletions
diff --git a/examples/server/tests/features/slotsave.feature b/examples/server/tests/features/slotsave.feature
new file mode 100644
index 00000000..1c281c07
--- /dev/null
+++ b/examples/server/tests/features/slotsave.feature
@@ -0,0 +1,58 @@
+@llama.cpp
+@slotsave
+Feature: llama.cpp server slot management
+
+ Background: Server startup
+ Given a server listening on localhost:8080
+ And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
+ And prompt caching is enabled
+ And 2 slots
+ And . as slot save path
+ And 2048 KV cache size
+ And 42 as server seed
+ And 24 max tokens to predict
+ Then the server is starting
+ Then the server is healthy
+
+ Scenario: Save and Restore Slot
+ # First prompt in slot 1 should be fully processed
+ Given a user prompt "What is the capital of France?"
+ And using slot id 1
+ And a completion request with no api error
+ Then 24 tokens are predicted matching (Lily|cake)
+ And 22 prompt tokens are processed
+ When the slot 1 is saved with filename "slot1.bin"
+ Then the server responds with status code 200
+ # Since we have cache, this should only process the last tokens
+ Given a user prompt "What is the capital of Germany?"
+ And a completion request with no api error
+ Then 24 tokens are predicted matching (Thank|special)
+ And 7 prompt tokens are processed
+ # Loading the original cache into slot 0,
+ # we should only be processing 1 prompt token and get the same output
+ When the slot 0 is restored with filename "slot1.bin"
+ Then the server responds with status code 200
+ Given a user prompt "What is the capital of France?"
+ And using slot id 0
+ And a completion request with no api error
+ Then 24 tokens are predicted matching (Lily|cake)
+ And 1 prompt tokens are processed
+ # For verification that slot 1 was not corrupted during slot 0 load, same thing
+ Given a user prompt "What is the capital of Germany?"
+ And using slot id 1
+ And a completion request with no api error
+ Then 24 tokens are predicted matching (Thank|special)
+ And 1 prompt tokens are processed
+
+ Scenario: Erase Slot
+ Given a user prompt "What is the capital of France?"
+ And using slot id 1
+ And a completion request with no api error
+ Then 24 tokens are predicted matching (Lily|cake)
+ And 22 prompt tokens are processed
+ When the slot 1 is erased
+ Then the server responds with status code 200
+ Given a user prompt "What is the capital of France?"
+ And a completion request with no api error
+ Then 24 tokens are predicted matching (Lily|cake)
+ And 22 prompt tokens are processed