cuda : rename build flag to LLAMA_CUDA (#6299)

author: slaren <slarengh@gmail.com> 2024-03-26 01:16:01 +0100
committer: GitHub <noreply@github.com> 2024-03-26 01:16:01 +0100
commit: 280345968dabc00d212d43e31145f5c9961a7604 (patch)
tree: 4d0ada8b59a4c15cb6d4fe1a6b4740a30dcdb0f2 /docs
parent: b06c16ef9f81d84da520232c125d4d8a1d273736 (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/token_generation_performance_tips.md b/docs/token_generation_performance_tips.md
index d7e863df..3c434314 100644
--- a/docs/token_generation_performance_tips.md
+++ b/docs/token_generation_performance_tips.md
@@ -1,7 +1,7 @@
 # Token generation performance troubleshooting
 
-## Verifying that the model is running on the GPU with cuBLAS
-Make sure you compiled llama with the correct env variables according to [this guide](../README.md#cublas), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example:
+## Verifying that the model is running on the GPU with CUDA
+Make sure you compiled llama with the correct env variables according to [this guide](../README.md#CUDA), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example:
 ```shell
 ./main -m "path/to/model.gguf" -ngl 200000 -p "Please sir, may I have some "
 ```
author	slaren <slarengh@gmail.com>	2024-03-26 01:16:01 +0100
committer	GitHub <noreply@github.com>	2024-03-26 01:16:01 +0100
commit	280345968dabc00d212d43e31145f5c9961a7604 (patch)
tree	4d0ada8b59a4c15cb6d4fe1a6b4740a30dcdb0f2 /docs
parent	b06c16ef9f81d84da520232c125d4d8a1d273736 (diff)