diff options
author | 0cc4m <picard12@live.de> | 2024-03-29 17:29:21 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-03-29 17:29:21 +0100 |
commit | ba0c7c70ab5b15f1f2be7fb0dfbe0366dda30d6c (patch) | |
tree | 041a10dd587c26c42171be18e0f587f1fca2feca /README.md | |
parent | d48ccf3ad4fea5b9ede209c7f40be65371987bfe (diff) |
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
* Fix Vulkan no kv offload incoherence
* Add k-quant mul mat mat shaders
* Rework working buffer allocation, reduces vram use noticeably
Clean up cpu assist code, replaced with ggml-backend offload function
* Default to all dedicated GPUs
* Add fallback for integrated GPUs if no dedicated GPUs are found
* Add debug info which device is allocating memory
* Fix Intel dequant issue
Fix validation issue
* Fix Vulkan GGML_OP_GET_ROWS implementation
* Clean up merge artifacts
* Remove Vulkan warning
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 9 |
1 files changed, 0 insertions, 9 deletions
@@ -636,15 +636,6 @@ Building the program with BLAS support may lead to some performance improvements - #### Vulkan -> [!WARNING] -> -> Vulkan support has been broken in https://github.com/ggerganov/llama.cpp/pull/6122 -> due to relying on `GGML_OP_GET_ROWS` which is not yet properly supported by the Vulkan backend, -> but should be fixed relatively soon (possibly in https://github.com/ggerganov/llama.cpp/pull/6155 -> (ref: https://github.com/ggerganov/llama.cpp/pull/6122#issuecomment-2015327635)). -> -> Meanwhile, if you want to use the Vulkan backend, you should use the commit right before the breaking change, https://github.com/ggerganov/llama.cpp/commit/55c1b2a3bbd470e9e2a3a0618b92cf64a885f806 - **With docker**: You don't need to install Vulkan SDK. It will be installed inside the container. |