`main`: add --json-schema / -j flag (#6659)

* main: add --json-schema / -j * json: move json-schema-to-grammar to common lib * json: fix zig build
author: Olivier Chafik <ochafik@users.noreply.github.com> 2024-04-15 18:35:21 +0100
committer: GitHub <noreply@github.com> 2024-04-15 18:35:21 +0100
commit: 7593639ce335e8d7f89aa9a54d616951f273af60 (patch)
tree: 936e7ef3214f03ebf1698292022be1a23ee991b0 /examples
parent: 132f55795e51094954f1b1f647f97648be724a3a (diff)
2 files changed, 4 insertions, 2 deletions
diff --git a/examples/main/README.md b/examples/main/README.md
index 10a589ce..649f4e0f 100644
--- a/examples/main/README.md
+++ b/examples/main/README.md
@@ -304,10 +304,12 @@ These options help improve the performance and memory usage of the LLaMA models.
 
 -   `--prompt-cache FNAME`: Specify a file to cache the model state after the initial prompt. This can significantly speed up the startup time when you're using longer prompts. The file is created during the first run and is reused and updated in subsequent runs. **Note**: Restoring a cached prompt does not imply restoring the exact state of the session at the point it was saved. So even when specifying a specific seed, you are not guaranteed to get the same sequence of tokens as the original generation.
 
-### Grammars
+### Grammars & JSON schemas
 
 -   `--grammar GRAMMAR`, `--grammar-file FILE`: Specify a grammar (defined inline or in a file) to constrain model output to a specific format. For example, you could force the model to output JSON or to speak only in emojis. See the [GBNF guide](../../grammars/README.md) for details on the syntax.
 
+-   `--json-schema SCHEMA`: Specify a [JSON schema](https://json-schema.org/) to constrain model output to (e.g. `{}` for any JSON object, or `{"items": {"type": "string", "minLength": 10, "maxLength": 100}, "minItems": 10}` for a JSON array of strings with size constraints). If a schema uses external `$ref`s, you should use `--grammar "$( python examples/json_schema_to_grammar.py myschema.json )"` instead.
+
 ### Quantization
 
 For information about 4-bit quantization, which can significantly improve performance and reduce memory usage, please refer to llama.cpp's primary [README](../../README.md#prepare-and-quantize).
diff --git a/examples/server/CMakeLists.txt b/examples/server/CMakeLists.txt
index d2ee47d0..61f58417 100644
--- a/examples/server/CMakeLists.txt
+++ b/examples/server/CMakeLists.txt
@@ -11,7 +11,7 @@ install(TARGETS ${TARGET} RUNTIME)
 target_compile_definitions(${TARGET} PRIVATE
     SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}>
 )
-target_link_libraries(${TARGET} PRIVATE common json-schema-to-grammar ${CMAKE_THREAD_LIBS_INIT})
+target_link_libraries(${TARGET} PRIVATE common ${CMAKE_THREAD_LIBS_INIT})
 if (LLAMA_SERVER_SSL)
     find_package(OpenSSL REQUIRED)
     target_link_libraries(${TARGET} PRIVATE OpenSSL::SSL OpenSSL::Crypto)
author	Olivier Chafik <ochafik@users.noreply.github.com>	2024-04-15 18:35:21 +0100
committer	GitHub <noreply@github.com>	2024-04-15 18:35:21 +0100
commit	7593639ce335e8d7f89aa9a54d616951f273af60 (patch)
tree	936e7ef3214f03ebf1698292022be1a23ee991b0 /examples
parent	132f55795e51094954f1b1f647f97648be724a3a (diff)