llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)

* llama : remove LLAMA_MAX_DEVICES from llama.h ggml-ci * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * server : remove LLAMA_MAX_DEVICES ggml-ci * llama : remove LLAMA_SUPPORTS_GPU_OFFLOAD ggml-ci * train : remove LLAMA_SUPPORTS_GPU_OFFLOAD * readme : add deprecation notice * readme : change deprecation notice to "remove" and fix url * llama : remove gpu includes from llama.h ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>
author: Georgi Gerganov <ggerganov@gmail.com> 2024-01-31 17:30:17 +0200
committer: GitHub <noreply@github.com> 2024-01-31 17:30:17 +0200
commit: 5cb04dbc16d1da38c8fdcc0111b40e67d00dd1c3 (patch)
tree: 3ef8dc640d5c08466309c09a8ac2963bb760af06 /examples/batched-bench
parent: efb7bdbbd061d087c788598b97992c653f992ddd (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/examples/batched-bench/batched-bench.cpp b/examples/batched-bench/batched-bench.cpp
index 7924db26..b52d6845 100644
--- a/examples/batched-bench/batched-bench.cpp
+++ b/examples/batched-bench/batched-bench.cpp
@@ -88,7 +88,7 @@ int main(int argc, char ** argv) {
 
     llama_model_params model_params = llama_model_default_params();
 
-    const std::vector<float> t_split (LLAMA_MAX_DEVICES, 0.0f);
+    const std::vector<float> t_split(llama_max_devices(), 0.0f);
 
     model_params.n_gpu_layers = n_gpu_layers;
     model_params.tensor_split = t_split.data();
author	Georgi Gerganov <ggerganov@gmail.com>	2024-01-31 17:30:17 +0200
committer	GitHub <noreply@github.com>	2024-01-31 17:30:17 +0200
commit	5cb04dbc16d1da38c8fdcc0111b40e67d00dd1c3 (patch)
tree	3ef8dc640d5c08466309c09a8ac2963bb760af06 /examples/batched-bench
parent	efb7bdbbd061d087c788598b97992c653f992ddd (diff)