ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-12-04	swift : fix prompt tokenization logic (#4321)	Miwa / Ensan

2023-12-04	grammar-parser : fix typo (#4318)	Ikko Eltociear Ashimine
	preceeding -> preceding
2023-12-03	ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308)	Georgi Gerganov
	* ggml : fix soft max out-of-bounds access ggml-ci * ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() ggml-ci
2023-12-03	ggml : fix soft max out-of-bounds access (#4307)	Georgi Gerganov
	ggml-ci
2023-12-03	server : fix OpenAI API `stop` field to be optional (#4299)	Ed Lee
	(cherry picked from commit Mozilla-Ocho/llamafile@e8c92bcb84ae3bcbf0d617b7ee6a5413bcbd58af)
2023-12-03	py : add grammar to oai like api (#4294)	Rickard Edén

2023-12-03	llama : pad KV cache size (#4280)	Georgi Gerganov
	* llama : pad KV cache size to 32 * metal : try to improve batched decoding
2023-12-01	llama : avoid using "optional" keyword (#4283)	Georgi Gerganov

2023-12-01	llama : support optional tensors (#4283)	Georgi Gerganov

2023-12-01	swift : fix token_to_piece implementation (#4278)	Miwa / Ensan
	* Fix token_to_piece implementation in Swift * Fix errors
2023-12-01	build : enable libstdc++ assertions for debug builds (#4275)	Jared Van Bortel

2023-12-01	llama : support attention bias on LLaMA architecture (#4283)	CausalLM
	* Support attention_bias on LLaMA architecture QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608). * check existence of qkvo bias while loading llama models Tested on LLaMA2, CUDA and CPU. * Update llama.cpp
2023-12-01	llama : add Qwen support (#4281)	Shijie
	* enable qwen to llama.cpp * llama : do not GPU split bias tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-01	llama : fix integer overflow during quantization (#4284)	Georgi Gerganov
	happens with multi-threaded quantization of Qwen-72B ggml-ci
2023-12-01	py : add requirements file for convert-hf-to-gguf.py (#4277)	Daniel Bevenius
	This commit adds a requirements file for the convert-hf-to-gguf.py script, and also add the torch and transformers packages to it. The motivation for this is that currently running convert-hf-to-gguf.py will produce the following error: ```console $ python3 -m venv venv $ source venv/bin/activate (venv) $ pip install -r requirements.txt Collecting numpy==1.24.4 Collecting sentencepiece==0.1.98 Collecting gguf>=0.1.0 Installing collected packages: sentencepiece, numpy, gguf Successfully installed gguf-0.5.1 numpy-1.24.4 sentencepiece-0.1.98 (venv) $ python convert-hf-to-gguf.py --help Traceback (most recent call last): File "llama.cpp/convert-hf-to-gguf.py", line 16, in <module> import torch ModuleNotFoundError: No module named 'torch' ``` With this commit, and using requirements-hf-to-gguf.txt instead of requirements.txt, the script can be run and shows the help output. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-12-01	ggml : add ggml_soft_max_ext (#4256)	Georgi Gerganov
	* metal : implement soft_max_ext * cuda : implement soft_max_ext * ggml : implement soft_max_ext (CPU) * batched-bench : print threads ggml-ci * metal : simplify soft_max encoding ggml-ci * cuda : use 512 threads for soft_max instead of 32 * ggml : update soft max cpu * cuda : do warp-based block reduce * cuda : increase max block size to 1024 * cuda : fix warp reduction initialization of shared mem * metal : warp-based reduction for soft max kernel * metal : warp-based reduce for rms_norm * metal : simplify soft max kernel ggml-ci * alloc : fix build with debug
2023-12-01	server : add --log-disable to disable logging to file (#4260)	Ziad Ben Hadj-Alouane
	* * add --log-disable to disable logging to file in the server example * * typo fix
2023-12-01	server : add single-client multi-prompt support (#4232)	Ziad Ben Hadj-Alouane
	* * add multiprompt support * * cleanup * * more cleanup * * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests * * remove all references to mutex_multitasks * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * * change to set --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-12-01	make : fix Apple clang determination bug (#4272)	WillCorticesAI
	Co-authored-by: Will Findley <findley@gmail.com>
2023-12-01	build : fix build info generation and cleanup Makefile (#3920)	Jared Van Bortel
	* cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit 635e9fadfd516d4604a0fecf4a854bfb25ad17ae.
2023-11-30	llava : ShareGPT4V compatibility (vision encoder only loading) (#4172)	John
	* ShareGPT4 compatibility (vision encoder only loading) Load only a CLIP vision encoder (as supplied by ShareGPT finetunes) Corrects the argument parsing for --img_mean and --img_std (which were previously not parsed but attempted to access) Defines defaults for img_mean and img_std which are equal to the llava 1.5 CLIP encoder, so you do not have to provide them * Update convert-image-encoder-to-gguf.py
2023-11-30	main : pass LOG_TEE callback to llama.cpp log (#4033)	Andrew Godfrey
	* main : Call llama_log_set to use LOG_TEE * tabs to spaces
2023-11-30	readme : fix (#4135)	vodkaslime
	* fix: readme * chore: resolve comments * chore: resolve comments
2023-11-30	docker : add finetune option (#4211)	Juraj Bednar

2023-11-30	batched.swift : update README.md (#4214)	Miwa / Ensan
	docs: update how to run
2023-11-30	cmake : fix the metal file foder path (#4217)	Li Tan

2023-11-30	readme : fix typo (#4253)	Dawid Wysocki
	llama.cpp uses GitHub Actions, not Gitlab Actions.
2023-11-30	llama : fix alignment of general.name in print meta (#4254)	Daniel Bevenius
	* llama: fix alignment of general.name in print meta This commit fixes the alignment of the general.name field in the llm_load_print_meta function. Currently the output looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` And with this commit it looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llama: fix alignment of special tokens Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-11-30	convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258)	slaren

2023-11-30	llama : fix typical sampling (#4261)	tarcey
	Typical sampling was broken because after copying new_candidates into canditates, the "sorted" bool is left at "true", but the new data is no longer sorted according to probability. Patch to set "sorted" to false. Test: Generating with temp=0.0001 (approx. argmax) should generate the same sequence at typical>=1.0 and typical=0.9999 (approx. disabled, but enters the typical sampling codepath).
2023-11-30	py : fix oai proxy (#3972)	rhjdvsgsgks
	* fix oai proxy fix generation not stoped while bot stop talking in chat mode fix possible `slot_id` not exist response for cors (and pre flight) * oai proxy: workaround for some client (such as Chatbox) * use stop as separator to replace hardcoded `\n`
2023-11-29	examples : add readme files	Georgi Gerganov

2023-11-29	readme : add FreeChat (#4248)	Peter Sugihara

2023-11-28	ggml : restore abort() in GGML_ASSERT (#4242)	Jared Van Bortel

2023-11-28	ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full ↵	Georgi Gerganov
	offload checks in llama.cpp (#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci
2023-11-27	cmake : fix issue with version info not getting baked into LlamaConfig.cmake ↵	bandoti
	(#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option
2023-11-27	readme : add Amica to UI list (#4230)	Kasumi

2023-11-27	examples : iOS example with swift ui (#4159)	Bailey Chittle
	* copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-26	ggml : fix -Warray-bounds warning with gcc (#4231)	Jared Van Bortel

2023-11-26	lookahead : support `-n -1` infinite generation	Georgi Gerganov

2023-11-26	readme : update hot topics	Georgi Gerganov

2023-11-26	lookahead : add example for lookahead decoding (#4207)	Georgi Gerganov
	* lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren <slarengh@gmail.com>
2023-11-26	metal : fix yarn (#4220)	Xiao-Yong Jin
	get the correct n_orig_ctx in metal
2023-11-25	scripts : Use mmap in torch load (#4202)	Galunid
	* Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference
2023-11-25	llama : grammar `reserve` space in `decode_utf8` (#4210)	Marcus Dunn
	* reserve space for codepoints * improvement for the appended 0
2023-11-25	Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189)	crasm

2023-11-25	readme : update hot topics	Georgi Gerganov

2023-11-25	server : OAI API compatibility (#4198)	Georgi Gerganov
	* Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com> Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>
2023-11-24	llama : set metal log callback correctly (#4204)	slaren

2023-11-24	ggml-cuda : support stablelm rope (#4156)	slaren
	* ggml-cuda : support stablelm rope * remove unused freq_base kernel parameter * add n_dims parameter to llm_build_k_shift, default to n_rot via overload * llama : fix llm_build_k_shift args --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>