ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-04-04	server : add option to disable KV offload (#6468)	Xiao-Yong Jin

2024-04-04	convert : fix for lint error complaining of bare except (#6470)	Clint Herron

2024-04-03	A few small fixes to server's README docs (#6428)	Fattire
	* Typo fix to server's README.md Fix minor typo ("tonen") in server README. * server readme grammar/style fixes. Quickly went through this file to look for inconsistencies in presentation of defaults, flag options, and looked for typos and grammar issues. Not perfect, but hopefully improved. * Update README.md Remove an extra space before newline.
2024-04-03	server : handle exception on wrong type in request (#6452)	JH23X
	Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net>
2024-04-03	llama : add SEA-LION support (#6448)	bryanSwk
	* initial commit for sealion support * add sealion support * minor fix * q/k ln and pos_embd only if required * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * minor : clear whitespaces --------- Co-authored-by: bryan <bryansiow@aisingapore.org> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	ci : update checkout, setup-python and upload-artifact to latest (#6456)	Ewout ter Hoeven
	* CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4
2024-04-03	server: add cURL support to `server.Dockerfile` (#6461)	Ed Lepedus

2024-04-03	readme : add feature-rich rust bindings (#6465)	Francisco Melo

2024-04-03	security : create policy (#6354)	Joyce
	* Create SECURITY.md Signed-off-by: Joyce <joycebrum@google.com> * Fix: link on SECURITY.md Signed-off-by: Joyce <joycebrum@google.com> * Fix: link on SECURITY.md Signed-off-by: Joyce <joycebrum@google.com> * minor * fix * fix --------- Signed-off-by: Joyce <joycebrum@google.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	Missing tokenizer.model error during gguf conversion (#6443)	Abhishek Gopinath K
	Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-04-03	Add OpenChat, Alpaca, Vicuna chat templates (#6397)	kaizau
	* Add openchat chat template * Add chat template test for openchat * Add chat template for vicuna * Add chat template for orca-vicuna * Add EOS for vicuna templates * Combine vicuna chat templates * Add tests for openchat and vicuna chat templates * Add chat template for alpaca * Add separate template name for vicuna-orca * Remove alpaca, match deepseek with jinja output * Regenerate chat template test with add_generation_prompt * Separate deepseek bos from system message * Match openchat template with jinja output * Remove BOS token from templates, unprefix openchat
2024-04-03	readme : update hot topics	Georgi Gerganov

2024-04-03	ggml : mul_mat_id use the same tensor for all the experts (#6387)	slaren
	* ggml : update mul_mat_id to use the same tensor for all the experts * update cuda * minor * update metal * update test-backend-ops * fix cuda * Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update convert.py * update convert-hf-to-gguf.py * update convert.py for mixtral hf models * Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * cuda : support non-pow-2 number of experts * allow quantize to work for split and merged experts models in the same way * cleanup + disable mmap automatically with split tensors models * update imatrix * test-backend-ops : test qwen argsort * update grok model loading * llama : add merged experts tensors to the grok tensor map * minor * gguf : bump version * fix quantizing of merged experts * convert-hf-to-gguf.py : update grok (untested) * make linter happy * cuda/argsort : use shared memory instead of pool memory * convert : fix grok tensor names * metal : add support for non-pow-2 argsort * llama : more loader cleanup, better error checking * cuda : fix warning * llama : still use mmap for loading old models, but copy the data to a host buffer * add review note * llama : remove ffn tensor counting + add sanity check ggml-ci * convert : fix handling of n_experts == None ggml-ci * imatrix : fix ncall counters * llama : produce error if imatrix size does not match * quantize : terminate on errors + trace logs ggml-ci * metal : pad shared memory to 16 bytes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	[SYCL] Disable iqx on windows as WA (#6435)	Meng, Hengyu
	* disable iqx on windows as WA * array instead of global_memory
2024-04-01	flake.lock: Update (#6402)	Georgi Gerganov
	Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-01	compare-llama-bench.py: fix long hexsha args (#6424)	Johannes Gäßler

2024-04-01	ci: server: verify deps are coherent with the commit (#6409)	Pierrick Hymbert
	* ci: server: verify deps are coherent with the commit * ci: server: change the ref to build as now it's a pull event target
2024-03-31	readme : update hot topics	Georgi Gerganov

2024-03-30	ci: bench: fix Resource not accessible by integration on PR event (#6393)	Pierrick Hymbert

2024-03-29	Fedora build update (#6388)	Mohammadreza Hendiani
	* fixed deprecated address * fixed deprecated address * fixed deprecated address * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * reverted back to only the MIT license
2024-03-29	split: allow --split-max-size option (#6343)	Xuan Son Nguyen
	* split by max size * clean up arg parse * split: ok * add dry run option * error on 0 tensors * be positive * remove next_metadata_size
2024-03-29	Vulkan k-quant mmq and ggml-backend offload functionality (#6155)	0cc4m
	* Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning
2024-03-29	sync : ggml (#6351)	Georgi Gerganov
	* sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-03-29	[Model] Add support for xverse (#6301)	hxer7963
	* Support xverse model convert to gguf format. * 1. Convert xverse models to gguf; 2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md; * * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter * llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers. * - Fix format issues - Remove duplicate set kqv_out to llm_build_kv * Update llama.cpp --------- Co-authored-by: willhe <willhe@xverse.cn> Co-authored-by: willhe <hexin@xverse.cn>
2024-03-29	ci : fix BGE wget (#6383)	Georgi Gerganov
	ggml-ci
2024-03-29	readme : add project (#6356)	zhouwg
	* readme: add Android UI binding * Update README.md
2024-03-29	cmake : add explicit metal version options (#6370)	Matt Clayton
	* cmake: add explicit metal version options * Update CMakeLists.txt --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-29	llama : remove redundant reshape in build_kv_store (#6369)	Daniel Bevenius
	* llama: remove redundant reshape in build_kv_store This commit removes the reshape of the V matrix in the build_kv_store. The motivation for this is that V matrix has the shape: ```console (gdb) p v_cur $46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_MUL_MAT, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x0, view_offs = 0, data = 0x0, name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` And after reshaping this tensor we get: ```console gdb) p ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens) $44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_RESHAPE, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0, name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` I noticed that the `src` and `view_src` fields are different but that the dimensions are the same. From the code comment it seems like the reshape call is not needed and perhaps the above can motivate the removal of the reshape call. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llama : add assert --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-29	convert : allow conversion of Mistral HF models (#6144)	Pedro Cuenca
	* Allow conversion of Mistral HF models * Homogenize Llama, Mistral, Mixtral under the same entry. * Fix tokenizer, permute tensors * Use sentencepiece tokenizer, or fall back to hfft. * convert-hf : small fix for mypy * convert-hf : fix duplicated block_count * convert-hf : add vocab size to metadata --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-03-28	readme : add notice for UI list	Georgi Gerganov

2024-03-28	[SYCL] Revisited & updated SYCL build documentation (#6141)	Ouadie EL FAROUKI
	* Revisited & updated SYCL build documentation * removed outdated comment * Addressed PR comments * Trimed white spaces * added new end line
2024-03-28	convert : refactor vocab selection logic (#6355)	Jared Van Bortel

2024-03-28	llava : fix MobileVLM (#6364)	Ziang Wu
	* fix empty bug * Update MobileVLM-README.md added more results on devices * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update MobileVLM-README.md remove gguf links --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-28	llama : fix command-r inference when omitting outputs (#6367)	compilade

2024-03-28	ci: bench: fix master not schedule, fix commit status failed on external ↵	Pierrick Hymbert
	repo (#6365)
2024-03-28	doc: fix outdated default value of batch size (#6336)	Ting Sun
	* doc: fix outdated default value of batch size * doc: add doc for ubatch-size
2024-03-28	server : stop gracefully on SIGTERM (#6348)	Eric Zhang

2024-03-28	nix: removed unnessesary indentation	hutli

2024-03-28	nix: moved blas availability check to package inputs so it is still overridable	hutli

2024-03-28	using blas.meta.available to check host platform	hutli

2024-03-28	only using explicit blas if hostPlatform is allowed	hutli

2024-03-28	nix: .#windows: proper cross-compilation set-up	Someone Serge
	Take all dependencies from the cross stage, rather tha only stdenv
2024-03-28	nix: package: don't introduce the dependency on python	Someone Serge
	- The generic /usr/bin/env shebangs are good enough - Python deps are provisioned in the devShells - We need to be able to leave python out at least on windows (currently breaks eval)
2024-03-28	nix: .#widnows: init	hutli
	initial nix build for windows using zig mingwW64 build removes nix zig windows build removes nix zig windows build removed unnessesary glibc.static removed unnessesary import of pkgs in nix fixed missing trailing newline on non-windows nix builds overriding stdenv when building for crosscompiling to windows in nix better variables when crosscompiling windows in nix cross compile windows on macos removed trailing whitespace remove unnessesary overwrite of "CMAKE_SYSTEM_NAME" in nix windows build nix: keep file extension when copying result files during cross compile for windows nix: better checking for file extensions when using MinGW nix: using hostPlatform instead of targetPlatform when cross compiling for Windows using hostPlatform.extensions.executable to extract executable format
2024-03-28	doc: fix typo in MobileVLM-README.md (#6181)	Ziang Wu

2024-03-28	[SYCL] fix set main gpu crash (#6339)	Neo Zhang Jianyu

2024-03-27	server: continuous performance monitoring and PR comment (#6283)	Pierrick Hymbert
	* server: bench: init * server: bench: reduce list of GPU nodes * server: bench: fix graph, fix output artifact * ci: bench: add mermaid in case of image cannot be uploaded * ci: bench: more resilient, more metrics * ci: bench: trigger build * ci: bench: fix duration * ci: bench: fix typo * ci: bench: fix mermaid values, markdown generated * typo on the step name Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * ci: bench: trailing spaces * ci: bench: move images in a details section * ci: bench: reduce bullet point size --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-03-27	nix: ci: dont test cuda and rocm (for now)	Someone Serge
	Until https://github.com/ggerganov/llama.cpp/issues/6346 is resolved
2024-03-27	ggml : fix bounds checking of zero size views (#6347)	slaren

2024-03-27	make : whitespace	Georgi Gerganov