JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555)

* json: rename python schema converter to make import easier * server: skip null json_schema / grammar fields * json: deps management for primitive rules (+ allow null values) * json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?` * grammars: add troubleshooting section to readme * json: cap length of numbers to 15 digits before/after decimal point (avoids infinite gen, e.g. "one third" -> `0.333333333333...`) * json: unify all repetition code (w/ or w/o sep) * json: support string minLength/maxLength * server+json: update server/README w/ result_format * nits * json: fix type error w/ python 3.8 * json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions) * json: simplify DOT `{"type": "string", "pattern": "^.$"}` * json: remove recursion in opt_repetitions (avoids Python stack overflow) * json: rm dead code * json: rm useless assert & ggml.h import
author: Olivier Chafik <ochafik@users.noreply.github.com> 2024-04-12 19:43:38 +0100
committer: GitHub <noreply@github.com> 2024-04-12 19:43:38 +0100
commit: ab9a3240a9da941fdef5cd4a25f2b97c2f5a67aa (patch)
tree: aa2efe58bc95a650827db07c83eb8bc0e026162c /examples/server/README.md
parent: fbbc030ba93561fac842af994c5c6c4c1147f13b (diff)
1 files changed, 5 insertions, 0 deletions
diff --git a/examples/server/README.md b/examples/server/README.md
index a6fc92ea..918ac129 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -11,6 +11,7 @@ Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
  * Continuous batching
  * Multimodal (wip)
  * Monitoring endpoints
+ * Schema-constrained JSON response format
 
 The project is under active development, and we are [looking for feedback and contributors](https://github.com/ggerganov/llama.cpp/issues/4216).
 
@@ -250,6 +251,8 @@ node index.js
 
     `grammar`: Set grammar for grammar-based sampling.  Default: no grammar
 
+    `json_schema`: Set a JSON schema for grammar-based sampling (e.g. `{"items": {"type": "string"}, "minItems": 10, "maxItems": 100}` of a list of strings, or `{}` for any JSON). See [tests](../../tests/test-json-schema-to-grammar.cpp) for supported features.  Default: no JSON schema.
+
     `seed`: Set the random number generator (RNG) seed.  Default: `-1`, which is a random seed.
 
     `ignore_eos`: Ignore end of stream token and continue generating.  Default: `false`
@@ -365,6 +368,8 @@ Notice that each `probs` is an array of length `n_probs`.
 
     See [OpenAI Chat Completions API documentation](https://platform.openai.com/docs/api-reference/chat). While some OpenAI-specific features such as function calling aren't supported, llama.cpp `/completion`-specific features such as `mirostat` are supported.
 
+    The `response_format` parameter supports both plain JSON output (e.g. `{"type": "json_object"}`) and schema-constrained JSON (e.g. `{"type": "json_object", "schema": {"type": "string", "minLength": 10, "maxLength": 100}}`), similar to other OpenAI-inspired API providers.
+
     *Examples:*
 
     You can use either Python `openai` library with appropriate checkpoints:
author	Olivier Chafik <ochafik@users.noreply.github.com>	2024-04-12 19:43:38 +0100
committer	GitHub <noreply@github.com>	2024-04-12 19:43:38 +0100
commit	ab9a3240a9da941fdef5cd4a25f2b97c2f5a67aa (patch)
tree	aa2efe58bc95a650827db07c83eb8bc0e026162c /examples/server/README.md
parent	fbbc030ba93561fac842af994c5c6c4c1147f13b (diff)