diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 116 |
1 files changed, 3 insertions, 113 deletions
@@ -77,7 +77,7 @@ variety of hardware - locally and in the cloud. - AVX, AVX2 and AVX512 support for x86 architectures - 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use - Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP) -- Vulkan, SYCL, and (partial) OpenCL backend support +- Vulkan and SYCL backend support - CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity Since its [inception](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022), the project has @@ -371,16 +371,11 @@ In order to build llama.cpp you have four different options. 3. Install compilation dependencies. ```bash - sudo pkg install gmake automake autoconf pkgconf llvm15 clinfo clover \ - opencl clblast openblas + sudo pkg install gmake automake autoconf pkgconf llvm15 openblas gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15 -j4 ``` - **Notes:** With this packages you can build llama.cpp with OPENBLAS and - CLBLAST support for use OpenCL GPU acceleration in FreeBSD. Please read - the instructions for use and activate this options in this document below. - ### Homebrew On Mac and Linux, the homebrew package manager can be used via @@ -399,7 +394,7 @@ argument. ### BLAS Build -Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). Support with CPU-only BLAS implementations doesn't affect the normal generation performance. We may see generation performance improvements with GPU-involved BLAS implementations, e.g. cuBLAS, hipBLAS and CLBlast. There are currently several different BLAS implementations available for build and use: +Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). Support with CPU-only BLAS implementations doesn't affect the normal generation performance. We may see generation performance improvements with GPU-involved BLAS implementations, e.g. cuBLAS, hipBLAS. There are currently several different BLAS implementations available for build and use: - #### Accelerate Framework: @@ -553,111 +548,6 @@ Building the program with BLAS support may lead to some performance improvements | LLAMA_CUDA_MMV_Y | Positive integer | 1 | Block size in y direction for the HIP mul mat vec kernels. Increasing this value can improve performance on fast GPUs. Power of 2 recommended. Does not affect k-quants. | | LLAMA_CUDA_KQUANTS_ITER | 1 or 2 | 2 | Number of values processed per iteration and per HIP thread for Q2_K and Q6_K quantization formats. Setting this value to 1 can improve performance for slow GPUs. | -- #### CLBlast - - OpenCL acceleration is provided by the matrix multiplication kernels from the [CLBlast](https://github.com/CNugteren/CLBlast) project and custom kernels for ggml that can generate tokens on the GPU. - - You will need the [OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK). - - For Ubuntu, Debian, and Fedora the packages `opencl-headers`, `ocl-icd` may be needed. - - - For Windows, a pre-built SDK is available on the [OpenCL Releases](https://github.com/KhronosGroup/OpenCL-SDK/releases) page. - - - <details> - <summary>Installing the OpenCL SDK from source</summary> - - ```sh - git clone --recurse-submodules https://github.com/KhronosGroup/OpenCL-SDK.git - cd OpenCL-SDK - cmake -B build -DBUILD_DOCS=OFF \ - -DBUILD_EXAMPLES=OFF \ - -DBUILD_TESTING=OFF \ - -DOPENCL_SDK_BUILD_SAMPLES=OFF \ - -DOPENCL_SDK_TEST_SAMPLES=OFF - cmake --build build - cmake --install build --prefix /some/path - ``` - </details> - - ##### Installing CLBlast - - Pre-built CLBlast binaries may be found on the [CLBlast Releases](https://github.com/CNugteren/CLBlast/releases) page. For Unix variants, it may also be found in your operating system's packages. - - Linux packaging: - Fedora Linux: - ```bash - sudo dnf install clblast - ``` - - Alternatively, they may be built from source. - - - <details> - <summary>Windows:</summary> - - ```cmd - set OPENCL_SDK_ROOT="C:/OpenCL-SDK-v2023.04.17-Win-x64" - git clone https://github.com/CNugteren/CLBlast.git - cd CLBlast - cmake -B build -DBUILD_SHARED_LIBS=OFF -DOVERRIDE_MSVC_FLAGS_TO_MT=OFF -DTUNERS=OFF -DOPENCL_ROOT=%OPENCL_SDK_ROOT% -G "Visual Studio 17 2022" -A x64 - cmake --build build --config Release - cmake --install build --prefix C:/CLBlast - ``` - - (note: `--config Release` at build time is the default and only relevant for Visual Studio builds - or multi-config Ninja builds) - - - <details> - <summary>Unix:</summary> - - ```sh - git clone https://github.com/CNugteren/CLBlast.git - cd CLBlast - cmake -B build -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF - cmake --build build --config Release - cmake --install build --prefix /some/path - ``` - - Where `/some/path` is where the built library will be installed (default is `/usr/local`). - </details> - - ##### Building Llama with CLBlast - - - Build with make: - ```sh - make LLAMA_CLBLAST=1 - ``` - - CMake (Unix): - ```sh - cmake -B build -DLLAMA_CLBLAST=ON -DCLBlast_DIR=/some/path - cmake --build build --config Release - ``` - - CMake (Windows): - ```cmd - set CL_BLAST_CMAKE_PKG="C:/CLBlast/lib/cmake/CLBlast" - git clone https://github.com/ggerganov/llama.cpp - cd llama.cpp - cmake -B build -DBUILD_SHARED_LIBS=OFF -DLLAMA_CLBLAST=ON -DCMAKE_PREFIX_PATH=%CL_BLAST_CMAKE_PKG% -G "Visual Studio 17 2022" -A x64 - cmake --build build --config Release - cmake --install build --prefix C:/LlamaCPP - ``` - - ##### Running Llama with CLBlast - - The CLBlast build supports `--gpu-layers|-ngl` like the CUDA version does. - - To select the correct platform (driver) and device (GPU), you can use the environment variables `GGML_OPENCL_PLATFORM` and `GGML_OPENCL_DEVICE`. - The selection can be a number (starting from 0) or a text string to search: - - ```sh - GGML_OPENCL_PLATFORM=1 ./main ... - GGML_OPENCL_DEVICE=2 ./main ... - GGML_OPENCL_PLATFORM=Intel ./main ... - GGML_OPENCL_PLATFORM=AMD GGML_OPENCL_DEVICE=1 ./main ... - ``` - - The default behavior is to find the first GPU device, but when it is an integrated GPU on a laptop, for instance, the selectors are useful. - Using the variables it is possible to select a CPU-based driver as well, if so desired. - - You can get a list of platforms and devices from the `clinfo -l` command, etc. - - #### Vulkan **With docker**: |