summaryrefslogtreecommitdiff
path: root/ggml-rpc.cpp
AgeCommit message (Collapse)Author
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
* Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-06-17rpc : fix load/store misaligned addresses (#7948)Georgi Gerganov
2024-06-13rpc : fix ggml_backend_rpc_supports_buft() (#7918)Radoslav Gerganov
2024-06-13move BLAS to a separate backend (#6210)slaren
* move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-03llama : offload to RPC in addition to other backends (#7640)Radoslav Gerganov
* llama : offload to RPC in addition to other backends * - fix copy_tensor being called on the src buffer instead of the dst buffer - always initialize views in the view_src buffer - add RPC backend to Makefile build - add endpoint to all RPC object names * add rpc-server to Makefile * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-05-28rpc : resource management rework (#7562)Radoslav Gerganov
* rpc : resource management rework * address review comments
2024-05-20rpc : track allocated buffers (#7411)Radoslav Gerganov
* rpc : track allocated buffers ref: #7407 * rpc : pack rpc_tensor tightly
2024-05-17rpc : set SO_REUSEADDR for the server socket (#7320)Radoslav Gerganov
ref: #7293
2024-05-16rpc : add command line arg for specifying backend memoryRadoslav Gerganov
ref: #7293
2024-05-14ggml : add RPC backend (#6829)Radoslav Gerganov
* ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos