diff options
author | Behnam M <58621210+ibehnam@users.noreply.github.com> | 2024-01-09 05:02:05 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-01-09 12:02:05 +0200 |
commit | 128de3585b0f58b1e562733448fc00109f23a95d (patch) | |
tree | 74a500be284dc8305c002e30cab0397ca39cbe05 | |
parent | 8c5833031857c9e9ada61948bae894ab9c785f86 (diff) |
server : update readme about token probs (#4777)
* updated server readme to reflect the gg/server-token-probs-4088 commit
added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`.
* simplified the `completion_probabilities` JSON schema
It's now easier to understand what the structure of `completion_probabilities` looks like.
* minor : fix trailing whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
-rw-r--r-- | examples/server/README.md | 59 |
1 files changed, 34 insertions, 25 deletions
diff --git a/examples/server/README.md b/examples/server/README.md index 5d982962..d85a14f8 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -175,35 +175,44 @@ node index.js `system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime) - *Result JSON:* +### Result JSON: - Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion. +* Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion. - `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string. - `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options) +- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure: - `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model` - - `model`: The path to the model loaded with `-m` - - `prompt`: The provided `prompt` - - `stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token - - `stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered - - `stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided - - `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word) - - `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second` - - `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`) - - `tokens_evaluated`: Number of tokens evaluated in total from the prompt - - `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`) +``` +{ + "content": "<the token selected by the model>", + "probs": [ + { + "prob": float, + "tok_str": "<most likely token>" + }, + { + "prob": float, + "tok_str": "<second most likely tonen>" + }, + ... + ] +}, +``` +Notice that each `probs` is an array of length `n_probs`. + +- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string. +- `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options) +- `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model` +- `model`: The path to the model loaded with `-m` +- `prompt`: The provided `prompt` +- `stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token +- `stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered +- `stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided +- `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word) +- `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second` +- `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`) +- `tokens_evaluated`: Number of tokens evaluated in total from the prompt +- `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`) - **POST** `/tokenize`: Tokenize a given text. |