Age | Commit message (Collapse) | Author |
|
* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
|
|
|
|
* move BLAS to a separate backend
* rename GGML_USE_OPENBLAS to GGML_USE_BLAS
* alloc : reuse same buffer when the same buffer type if used multiple times
* set number of threads automatically for openblas and blis
* sched : print assignments when GGML_SCHED_DEBUG env variable is set
* sched : allow ops with weights on an incompatible buffer type
This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* llama : offload to RPC in addition to other backends
* - fix copy_tensor being called on the src buffer instead of the dst buffer
- always initialize views in the view_src buffer
- add RPC backend to Makefile build
- add endpoint to all RPC object names
* add rpc-server to Makefile
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
|
|
* rpc : resource management rework
* address review comments
|
|
* rpc : track allocated buffers
ref: #7407
* rpc : pack rpc_tensor tightly
|
|
ref: #7293
|
|
ref: #7293
|
|
* ggml : add RPC backend
The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).
* set TCP_NODELAY
* add CI workflows
* Address review comments
* fix warning
* implement llama_max_devices() for RPC
* Address review comments
* Address review comments
* wrap sockfd into a struct
* implement get_alignment and get_max_size
* add get_device_memory
* fix warning
* win32 support
* add README
* readme : trim trailing whitespace
* Address review comments
* win32 fix
* Address review comments
* fix compile warnings on macos
|