1 files changed, 15 insertions, 8 deletions
diff --git a/examples/server/tests/features/server.feature b/examples/server/tests/features/server.feature
index b571582a..7c977bcc 100644
--- a/examples/server/tests/features/server.feature
+++ b/examples/server/tests/features/server.feature
@@ -1,15 +1,17 @@
 @llama.cpp
+@server
 Feature: llama.cpp server
 
   Background: Server startup
     Given a server listening on localhost:8080
-    And   a model file stories260K.gguf
+    And   a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
     And   a model alias tinyllama-2
     And   42 as server seed
       # KV Cache corresponds to the total amount of tokens
       # that can be stored across all independent sequences: #4130
       # see --ctx-size and #5568
     And   32 KV cache size
+    And   512 as batch size
     And   1 slots
     And   embeddings extraction
     And   32 server max tokens to predict
@@ -29,9 +31,9 @@ Feature: llama.cpp server
     And   prometheus metrics are exposed
 
     Examples: Prompts
-      | prompt                           | n_predict | re_content                             | n_predicted |
-      | I believe the meaning of life is | 8         | (read<or>going)+                       | 8           |
-      | Write a joke about AI            | 64        | (park<or>friends<or>scared<or>always)+ | 32          |
+      | prompt                           | n_predict | re_content                       | n_predicted |
+      | I believe the meaning of life is | 8         | (read\|going)+                   | 8           |
+      | Write a joke about AI            | 64        | (park\|friends\|scared\|always)+ | 32          |
 
   Scenario Outline: OAI Compatibility
     Given a model <model>
@@ -43,9 +45,9 @@ Feature: llama.cpp server
     Then  <n_predicted> tokens are predicted matching <re_content>
 
     Examples: Prompts
-      | model        | system_prompt               | user_prompt                          | max_tokens | re_content                 | n_predicted | enable_streaming |
-      | llama-2      | Book                        | What is the best book                | 8          | (Mom<or>what)+             | 8           | disabled         |
-      | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64         | (thanks<or>happy<or>bird)+ | 32          | enabled          |
+      | model        | system_prompt               | user_prompt                          | max_tokens | re_content             | n_predicted | enable_streaming |
+      | llama-2      | Book                        | What is the best book                | 8          | (Mom\|what)+           | 8           | disabled         |
+      | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64         | (thanks\|happy\|bird)+ | 32          | enabled          |
 
   Scenario: Embedding
     When embeddings are computed for:
@@ -75,10 +77,15 @@ Feature: llama.cpp server
     When an OAI compatible embeddings computation request for multiple inputs
     Then embeddings are generated
 
-
   Scenario: Tokenize / Detokenize
     When tokenizing:
     """
     What is the capital of France ?
     """
     Then tokens can be detokenize
+
+  Scenario: Models available
+    Given available models
+    Then  1 models are supported
+    Then  model 0 is identified by tinyllama-2
+    Then  model 0 is trained on 128 tokens context