summaryrefslogtreecommitdiff
path: root/docs/build.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/build.md')
-rw-r--r--docs/build.md19
1 files changed, 18 insertions, 1 deletions
diff --git a/docs/build.md b/docs/build.md
index d9d12c46..8b16d1a3 100644
--- a/docs/build.md
+++ b/docs/build.md
@@ -178,7 +178,11 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht
cmake --build build --config Release
```
-The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance:
+The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used.
+
+The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.
+
+The following compilation options are also available to tweak performance:
| Option | Legal values | Default | Description |
|-------------------------------|------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -192,6 +196,19 @@ The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/c
| GGML_CUDA_PEER_MAX_BATCH_SIZE | Positive integer | 128 | Maximum batch size for which to enable peer access between multiple GPUs. Peer access requires either Linux or NVLink. When using NVLink enabling peer access for larger batch sizes is potentially beneficial. |
| GGML_CUDA_FA_ALL_QUANTS | Boolean | false | Compile support for all KV cache quantization type (combinations) for the FlashAttention CUDA kernels. More fine-grained control over KV cache size but compilation takes much longer. |
+### MUSA
+
+- Using `make`:
+ ```bash
+ make GGML_MUSA=1
+ ```
+- Using `CMake`:
+
+ ```bash
+ cmake -B build -DGGML_MUSA=ON
+ cmake --build build --config Release
+ ```
+
### hipBLAS
This provides BLAS acceleration on HIP-supported AMD GPUs.