summaryrefslogtreecommitdiff
path: root/convert-hf-to-gguf-update.py
diff options
context:
space:
mode:
authorDaniel Bevenius <daniel.bevenius@gmail.com>2024-05-09 13:03:29 +0200
committerGitHub <noreply@github.com>2024-05-09 14:03:29 +0300
commitfd9f92b154850014146f61717cd292a59a5cee5a (patch)
treeeeafd2cc566ad3617bed9e764aab5b98a56eef2d /convert-hf-to-gguf-update.py
parent22842164bcae3251b81ad9e497a16ef66833cb9e (diff)
llama : update llama_timings.n_p_eval setting (#7160)
This commit changes the value assigned to llama_timings.n_p_eval when ctx->n_p_eval is 0 to be 1 instead of 1 which is the current value. The motivation for this change is that if session caching is enabled, for example using the `--prompt-cache main-session.txt` command line argument for the main example, and if the same prompt is used then on subsequent runs, the prompt tokens will not actually be passed to llama_decode, and n_p_eval will not be updated by llama_synchoronize. But the value of n_p_eval will be set 1 by llama_get_timings because ctx->n_p_eval will be 0. This could be interpreted as 1 token was evaluated for the prompt which could be misleading for applications using this value. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Diffstat (limited to 'convert-hf-to-gguf-update.py')
0 files changed, 0 insertions, 0 deletions