diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/HOWTO-add-model.md | 2 | ||||
-rw-r--r-- | docs/token_generation_performance_tips.md | 4 |
2 files changed, 3 insertions, 3 deletions
diff --git a/docs/HOWTO-add-model.md b/docs/HOWTO-add-model.md index 13812424..3eec077e 100644 --- a/docs/HOWTO-add-model.md +++ b/docs/HOWTO-add-model.md @@ -100,7 +100,7 @@ Have a look at existing implementation like `build_llama`, `build_dbrx` or `buil When implementing a new graph, please note that the underlying `ggml` backends might not support them all, support for missing backend operations can be added in another PR. -Note: to debug the inference graph: you can use [eval-callback](../examples/eval-callback). +Note: to debug the inference graph: you can use [llama-eval-callback](../examples/eval-callback). ## GGUF specification diff --git a/docs/token_generation_performance_tips.md b/docs/token_generation_performance_tips.md index 3c434314..c0840cad 100644 --- a/docs/token_generation_performance_tips.md +++ b/docs/token_generation_performance_tips.md @@ -3,7 +3,7 @@ ## Verifying that the model is running on the GPU with CUDA Make sure you compiled llama with the correct env variables according to [this guide](../README.md#CUDA), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example: ```shell -./main -m "path/to/model.gguf" -ngl 200000 -p "Please sir, may I have some " +./llama-cli -m "path/to/model.gguf" -ngl 200000 -p "Please sir, may I have some " ``` When running llama, before it starts the inference work, it will output diagnostic information that shows whether cuBLAS is offloading work to the GPU. Look for these lines: @@ -27,7 +27,7 @@ RAM: 32GB Model: `TheBloke_Wizard-Vicuna-30B-Uncensored-GGML/Wizard-Vicuna-30B-Uncensored.q4_0.gguf` (30B parameters, 4bit quantization, GGML) -Run command: `./main -m "path/to/model.gguf" -p "An extremely detailed description of the 10 best ethnic dishes will follow, with recipes: " -n 1000 [additional benchmark flags]` +Run command: `./llama-cli -m "path/to/model.gguf" -p "An extremely detailed description of the 10 best ethnic dishes will follow, with recipes: " -n 1000 [additional benchmark flags]` Result: |