ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-07-27	Merge mainline llama.cpp (#3)	Kawrakow
	* Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-06-17	rpc : fix load/store misaligned addresses (#7948)	Georgi Gerganov

2024-06-13	rpc : fix ggml_backend_rpc_supports_buft() (#7918)	Radoslav Gerganov

2024-06-13	move BLAS to a separate backend (#6210)	slaren
	* move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-03	llama : offload to RPC in addition to other backends (#7640)	Radoslav Gerganov
	* llama : offload to RPC in addition to other backends * - fix copy_tensor being called on the src buffer instead of the dst buffer - always initialize views in the view_src buffer - add RPC backend to Makefile build - add endpoint to all RPC object names * add rpc-server to Makefile * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-05-28	rpc : resource management rework (#7562)	Radoslav Gerganov
	* rpc : resource management rework * address review comments
2024-05-20	rpc : track allocated buffers (#7411)	Radoslav Gerganov
	* rpc : track allocated buffers ref: #7407 * rpc : pack rpc_tensor tightly
2024-05-17	rpc : set SO_REUSEADDR for the server socket (#7320)	Radoslav Gerganov
	ref: #7293
2024-05-16	rpc : add command line arg for specifying backend memory	Radoslav Gerganov
	ref: #7293
2024-05-14	ggml : add RPC backend (#6829)	Radoslav Gerganov
	* ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos