Vulkan k-quant mmq and ggml-backend offload functionality (#6155)

* Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning
author: 0cc4m <picard12@live.de> 2024-03-29 17:29:21 +0100
committer: GitHub <noreply@github.com> 2024-03-29 17:29:21 +0100
commit: ba0c7c70ab5b15f1f2be7fb0dfbe0366dda30d6c (patch)
tree: 041a10dd587c26c42171be18e0f587f1fca2feca /README.md
parent: d48ccf3ad4fea5b9ede209c7f40be65371987bfe (diff)
1 files changed, 0 insertions, 9 deletions
diff --git a/README.md b/README.md
index 42925c21..d46e0755 100644
--- a/README.md
+++ b/README.md
@@ -636,15 +636,6 @@ Building the program with BLAS support may lead to some performance improvements
 
 - #### Vulkan
 
-> [!WARNING]
->
-> Vulkan support has been broken in https://github.com/ggerganov/llama.cpp/pull/6122
-> due to relying on `GGML_OP_GET_ROWS` which is not yet properly supported by the Vulkan backend,
-> but should be fixed relatively soon (possibly in https://github.com/ggerganov/llama.cpp/pull/6155
-> (ref: https://github.com/ggerganov/llama.cpp/pull/6122#issuecomment-2015327635)).
->
-> Meanwhile, if you want to use the Vulkan backend, you should use the commit right before the breaking change, https://github.com/ggerganov/llama.cpp/commit/55c1b2a3bbd470e9e2a3a0618b92cf64a885f806
-
   **With docker**:
 
   You don't need to install Vulkan SDK. It will be installed inside the container.
author	0cc4m <picard12@live.de>	2024-03-29 17:29:21 +0100
committer	GitHub <noreply@github.com>	2024-03-29 17:29:21 +0100
commit	ba0c7c70ab5b15f1f2be7fb0dfbe0366dda30d6c (patch)
tree	041a10dd587c26c42171be18e0f587f1fca2feca /README.md
parent	d48ccf3ad4fea5b9ede209c7f40be65371987bfe (diff)