ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	slaren <slarengh@gmail.com>	2023-12-21 21:07:46 +0100
committer	GitHub <noreply@github.com>	2023-12-21 21:07:46 +0100
commit	d232aca5a73b290e218a2e48b91023d5e994203f (patch)
tree	e763648880fad8ef44be54c9cb59c9c7dbda4168 /examples/export-lora/export-lora.cpp
parent	31f27758faf4a4bd08101a57c7ec3a473f771f86 (diff)

llama : initial ggml-backend integration (#4520)

* llama : initial ggml-backend integration * add ggml-metal * cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_CUDA_TEST access all tensor data with ggml_backend_tensor_get/set * add ggml_backend_buffer_clear zero-init KV cache buffer * add ggml_backend_buffer_is_hos, used to avoid copies if possible when accesing tensor data * disable gpu backends with ngl 0 * more accurate mlock * unmap offloaded part of the model * use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap * update quantize and lora * update session copy/set to use ggml-backend ggml-ci * use posix_fadvise instead of posix_fadvise64 * ggml_backend_alloc_ctx_tensors_from_buft : remove old print * llama_mmap::align_offset : use pointers instead of references for out parameters * restore progress_callback behavior * move final progress_callback call to load_all_data * cuda : fix fprintf format string (minor) * do not offload scales * llama_mmap : avoid unmapping the same fragments again in the destructor * remove unnecessary unmap * metal : add default log function that prints to stderr, cleanup code ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Diffstat (limited to 'examples/export-lora/export-lora.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: