docs : add grammar docs (#2701)

* docs : add grammar docs * tweaks to grammar guide * rework GBNF example to be a commented grammar
author: Evan Jones <evan.q.jones@gmail.com> 2023-08-22 21:01:57 -0400
committer: GitHub <noreply@github.com> 2023-08-22 21:01:57 -0400
commit: f5fe98d11bdf9e7797bcfb05c0c3601ffc4b9d26 (patch)
tree: df23a6c7ae3cd6af1e3c961c505fab7f3928c7ee /examples/main/README.md
parent: 777f42ba18b29f25c71ff8de3ecf97b8017304c0 (diff)
1 files changed, 4 insertions, 0 deletions
diff --git a/examples/main/README.md b/examples/main/README.md
index 60e3907d..d555afdc 100644
--- a/examples/main/README.md
+++ b/examples/main/README.md
@@ -288,6 +288,10 @@ These options help improve the performance and memory usage of the LLaMA models.
 
 -   `--prompt-cache FNAME`: Specify a file to cache the model state after the initial prompt. This can significantly speed up the startup time when you're using longer prompts. The file is created during the first run and is reused and updated in subsequent runs. **Note**: Restoring a cached prompt does not imply restoring the exact state of the session at the point it was saved. So even when specifying a specific seed, you are not guaranteed to get the same sequence of tokens as the original generation.
 
+### Grammars
+
+-   `--grammar GRAMMAR`, `--grammar-file FILE`: Specify a grammar (defined inline or in a file) to constrain model output to a specific format. For example, you could force the model to output JSON or to speak only in emojis. See the [GBNF guide](../../grammars/README.md) for details on the syntax.
+
 ### Quantization
 
 For information about 4-bit quantization, which can significantly improve performance and reduce memory usage, please refer to llama.cpp's primary [README](../../README.md#prepare-data--run).
author	Evan Jones <evan.q.jones@gmail.com>	2023-08-22 21:01:57 -0400
committer	GitHub <noreply@github.com>	2023-08-22 21:01:57 -0400
commit	f5fe98d11bdf9e7797bcfb05c0c3601ffc4b9d26 (patch)
tree	df23a6c7ae3cd6af1e3c961c505fab7f3928c7ee /examples/main/README.md
parent	777f42ba18b29f25c71ff8de3ecf97b8017304c0 (diff)