diff options
author | Daniel Bevenius <daniel.bevenius@gmail.com> | 2024-05-09 13:03:29 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-05-09 14:03:29 +0300 |
commit | fd9f92b154850014146f61717cd292a59a5cee5a (patch) | |
tree | eeafd2cc566ad3617bed9e764aab5b98a56eef2d /llama.cpp | |
parent | 22842164bcae3251b81ad9e497a16ef66833cb9e (diff) |
llama : update llama_timings.n_p_eval setting (#7160)
This commit changes the value assigned to llama_timings.n_p_eval when
ctx->n_p_eval is 0 to be 1 instead of 1 which is the current value.
The motivation for this change is that if session caching is enabled,
for example using the `--prompt-cache main-session.txt` command line
argument for the main example, and if the same prompt is used then on
subsequent runs, the prompt tokens will not actually be passed to
llama_decode, and n_p_eval will not be updated by llama_synchoronize.
But the value of n_p_eval will be set 1 by llama_get_timings because
ctx->n_p_eval will be 0. This could be interpreted as 1 token was
evaluated for the prompt which could be misleading for applications
using this value.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Diffstat (limited to 'llama.cpp')
-rw-r--r-- | llama.cpp | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -17879,7 +17879,7 @@ struct llama_timings llama_get_timings(struct llama_context * ctx) { /*.t_eval_ms =*/ 1e-3 * ctx->t_eval_us, /*.n_sample =*/ std::max(1, ctx->n_sample), - /*.n_p_eval =*/ std::max(1, ctx->n_p_eval), + /*.n_p_eval =*/ std::max(0, ctx->n_p_eval), /*.n_eval =*/ std::max(1, ctx->n_eval), }; |