diff options
author | slaren <slarengh@gmail.com> | 2023-12-21 21:07:46 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-21 21:07:46 +0100 |
commit | d232aca5a73b290e218a2e48b91023d5e994203f (patch) | |
tree | e763648880fad8ef44be54c9cb59c9c7dbda4168 /examples/export-lora/export-lora.cpp | |
parent | 31f27758faf4a4bd08101a57c7ec3a473f771f86 (diff) |
llama : initial ggml-backend integration (#4520)
* llama : initial ggml-backend integration
* add ggml-metal
* cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_CUDA_TEST
access all tensor data with ggml_backend_tensor_get/set
* add ggml_backend_buffer_clear
zero-init KV cache buffer
* add ggml_backend_buffer_is_hos, used to avoid copies if possible when accesing tensor data
* disable gpu backends with ngl 0
* more accurate mlock
* unmap offloaded part of the model
* use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap
* update quantize and lora
* update session copy/set to use ggml-backend
ggml-ci
* use posix_fadvise instead of posix_fadvise64
* ggml_backend_alloc_ctx_tensors_from_buft : remove old print
* llama_mmap::align_offset : use pointers instead of references for out parameters
* restore progress_callback behavior
* move final progress_callback call to load_all_data
* cuda : fix fprintf format string (minor)
* do not offload scales
* llama_mmap : avoid unmapping the same fragments again in the destructor
* remove unnecessary unmap
* metal : add default log function that prints to stderr, cleanup code
ggml-ci
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/export-lora/export-lora.cpp')
0 files changed, 0 insertions, 0 deletions