From 532dd74e38c29e16ea1cfc4e7eedb4f2fab3f3cd Mon Sep 17 00:00:00 2001 From: Richard Kiss Date: Sat, 11 Nov 2023 22:04:58 -0800 Subject: Fix some documentation typos/grammar mistakes (#4032) * typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --- docs/token_generation_performance_tips.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs') diff --git a/docs/token_generation_performance_tips.md b/docs/token_generation_performance_tips.md index c9acff7d..d7e863df 100644 --- a/docs/token_generation_performance_tips.md +++ b/docs/token_generation_performance_tips.md @@ -17,7 +17,7 @@ llama_model_load_internal: [cublas] total VRAM used: 17223 MB If you see these lines, then the GPU is being used. ## Verifying that the CPU is not oversaturated -llama accepts a `-t N` (or `--threads N`) parameter. It's extremely important that this parameter is not too large. If your token generation is extremely slow, try setting this number to 1. If this significantly improves your token generation speed, then your CPU is being oversaturated and you need to explicitly set this parameter to the number of the physicial CPU cores on your machine (even if you utilize a GPU). If in doubt, start with 1 and double the amount until you hit a performance bottleneck, then scale the number down. +llama accepts a `-t N` (or `--threads N`) parameter. It's extremely important that this parameter is not too large. If your token generation is extremely slow, try setting this number to 1. If this significantly improves your token generation speed, then your CPU is being oversaturated and you need to explicitly set this parameter to the number of the physical CPU cores on your machine (even if you utilize a GPU). If in doubt, start with 1 and double the amount until you hit a performance bottleneck, then scale the number down. # Example of runtime flags effect on inference speed benchmark These runs were tested on the following machine: -- cgit v1.2.3