server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708)

* server: monitoring - add /metrics prometheus compatible endpoint * server: concurrency issue, when 2 task are waiting for results, only one call thread is notified * server: metrics - move to a dedicated struct
author: Pierrick Hymbert <pierrick.hymbert@gmail.com> 2024-02-25 13:49:43 +0100
committer: GitHub <noreply@github.com> 2024-02-25 13:49:43 +0100
commit: d52d7819b8ced70c642a88a59da8c78208dc58ec (patch)
tree: 07841f1c5b7ab748bac463e62f3fb7ce0b7f96e9 /examples/server/tests/features/server.feature
parent: 12894088170f62e4cad4f8d6a3043c185b414bab (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/examples/server/tests/features/server.feature b/examples/server/tests/features/server.feature
index 5f81d256..0139f89d 100644
--- a/examples/server/tests/features/server.feature
+++ b/examples/server/tests/features/server.feature
@@ -13,6 +13,7 @@ Feature: llama.cpp server
     And   1 slots
     And   embeddings extraction
     And   32 server max tokens to predict
+    And   prometheus compatible metrics exposed
     Then  the server is starting
     Then  the server is healthy
 
@@ -25,6 +26,7 @@ Feature: llama.cpp server
     And   <n_predict> max tokens to predict
     And   a completion request with no api error
     Then  <n_predicted> tokens are predicted matching <re_content>
+    And   prometheus metrics are exposed
 
     Examples: Prompts
       | prompt                           | n_predict | re_content                   | n_predicted |
author	Pierrick Hymbert <pierrick.hymbert@gmail.com>	2024-02-25 13:49:43 +0100
committer	GitHub <noreply@github.com>	2024-02-25 13:49:43 +0100
commit	d52d7819b8ced70c642a88a59da8c78208dc58ec (patch)
tree	07841f1c5b7ab748bac463e62f3fb7ce0b7f96e9 /examples/server/tests/features/server.feature
parent	12894088170f62e4cad4f8d6a3043c185b414bab (diff)