ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-04-08	llama : save and restore kv cache for single seq id (#6341)	Jan Boon
	* llama : save and restore kv cache for single seq id * remove trailing whitespace * respond error in case there's no space in the kv cache * add kv seq save restore to test case * add --slot-save-path arg to enable save restore and restrict save location * Returning 0 for some cases, instead of asserting. * cleanup error cases * rename sequence state functions * rename state get set functions * add previous function names back in with DEPRECATED notice * update doc * adjust endpoints to preferred style * fix restoring zero cell count * handle seq rm return value * unused param * keep in the size check * fix return types * add server test case for slot save restore * cleanup * add cake * cleanup style * add special * removing a whole sequence never fails * move sequence state file functionality from server to llama to match session api and add version tags * catch exceptions on save as well * error log messages * check types for stricter restore * update server doc * readme : update API changes date * strict filename validation * move include, reject bom as well * also reject empty filename * reject whitespace and trailing dot --------- Co-authored-by: Martin Evans <martindevans@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-08	remove row=1 cond (#6532)	Abhilash Majumder

2024-04-08	Adding KodiBot to UI list (#6535)	Firat
	KodiBot is free and open source ai chat app released under the GNU General Public License.
2024-04-07	Change Windows AMD example to release build to make inference much faster. ↵	Mark Fairbairn
	(#6525)
2024-04-07	flake.lock: Update (#6517)	Georgi Gerganov
	Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01) → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-07	Add GritLM as supported models. (#6513)	DAN™

2024-04-07	sync : ggml	Georgi Gerganov

2024-04-07	ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020)	Slava Primenko
	`cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1 See this issue for more details: https://github.com/ggerganov/examples/whisper/whisper.cpp/issues/2007
2024-04-07	scripts : sync ggml-cuda folder	Georgi Gerganov

2024-04-07	Run make to build the project (#6457)	limitedAtonement

2024-04-07	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, ↵	Neo Zhang Jianyu
	GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521)
2024-04-06	sync : ggml	Georgi Gerganov

2024-04-06	backend : fix typo in scheduler documentation (ggml/781)	Daniel Bevenius
	Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-06	Tests: Added integration tests for GBNF parser (#6472)	Clint Herron
	* Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. * Fixing whitespace errors and cleaning error message alert to be clearer. * Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. * Comment cleanup. * Reorganizing tests for readability. * Cleaning up debug message to make a bit more sense.
2024-04-06	ci: bench: support sse and fix prompt processing time / server: add tokens ↵	Pierrick Hymbert
	usage in stream OAI response (#6495) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate
2024-04-05	gguf.py : add licence and version to gguf writer (#6504)	Brian

2024-04-05	readme : update UI list (#6503)	Hoang Nguyen
	* Add MindMac to UI list * Update proprietary description Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-04-05	bench : make n_batch and n_ubatch configurable in Batched bench (#6500)	Ting Sun
	* bench: make n_batch and n_ubatch configurable * bench: update doc for batched bench
2024-04-05	[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464)	Ouadie EL FAROUKI
	* moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> --------- Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
2024-04-04	readme : add Dot to UI list (#6487)	alexpinel

2024-04-04	readme : fix typo (#6481)	Jun Jie

2024-04-04	server: add cURL support to server Dockerfiles (#6474)	Ed Lepedus
	* server: add cURL support to `full.Dockerfile` * server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile` * server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile` * server: add cURL support to `server-intel.Dockerfile` * server: add cURL support to `server-vulkan.Dockerfile` * fix typo in `server-vulkan.Dockerfile` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-04	ci: exempt master branch workflows from getting cancelled (#6486)	Minsoo Cheong
	* ci: exempt master branch workflows from getting cancelled * apply to bench.yml
2024-04-04	build CI: Name artifacts (#6482)	Ewout ter Hoeven
	Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).
2024-04-04	server: allow penalizing repetition of newlines on server webpage (#6431)	Shakhar Dasgupta

2024-04-04	ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478)	Pierrick Hymbert

2024-04-04	Correct README link (#6458)	limitedAtonement
	README is called README.md.
2024-04-04	ci: bench: add more ftype, fix triggers and bot comment (#6466)	Pierrick Hymbert
	* ci: bench: change trigger path to not spawn on each PR * ci: bench: add more file type for phi-2: q8_0 and f16. - do not show the comment by default * ci: bench: add seed parameter in k6 script * ci: bench: artefact name perf job * Add iteration in the commit status, reduce again the autocomment * ci: bench: add per slot metric in the commit status * Fix trailing spaces
2024-04-04	common: remove duplicate check for curl (#6471)	Daniel Bevenius
	This commit removes one of the two identical checks for curl being NULL in llama_load_model_from_url. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-04	examples : add GBNF validator program (#5948)	Clint Herron
	* Revising GBNF validator program to be much simpler. * Changing from streams to using cstdio * Adding final newline character.
2024-04-04	server : remove obsolete --memory-f32 option	Georgi Gerganov

2024-04-04	server : add option to disable KV offload (#6468)	Xiao-Yong Jin

2024-04-04	convert : fix for lint error complaining of bare except (#6470)	Clint Herron

2024-04-03	A few small fixes to server's README docs (#6428)	Fattire
	* Typo fix to server's README.md Fix minor typo ("tonen") in server README. * server readme grammar/style fixes. Quickly went through this file to look for inconsistencies in presentation of defaults, flag options, and looked for typos and grammar issues. Not perfect, but hopefully improved. * Update README.md Remove an extra space before newline.
2024-04-03	server : handle exception on wrong type in request (#6452)	JH23X
	Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net>
2024-04-03	llama : add SEA-LION support (#6448)	bryanSwk
	* initial commit for sealion support * add sealion support * minor fix * q/k ln and pos_embd only if required * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * minor : clear whitespaces --------- Co-authored-by: bryan <bryansiow@aisingapore.org> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	ci : update checkout, setup-python and upload-artifact to latest (#6456)	Ewout ter Hoeven
	* CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4
2024-04-03	server: add cURL support to `server.Dockerfile` (#6461)	Ed Lepedus

2024-04-03	readme : add feature-rich rust bindings (#6465)	Francisco Melo

2024-04-03	security : create policy (#6354)	Joyce
	* Create SECURITY.md Signed-off-by: Joyce <joycebrum@google.com> * Fix: link on SECURITY.md Signed-off-by: Joyce <joycebrum@google.com> * Fix: link on SECURITY.md Signed-off-by: Joyce <joycebrum@google.com> * minor * fix * fix --------- Signed-off-by: Joyce <joycebrum@google.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	Missing tokenizer.model error during gguf conversion (#6443)	Abhishek Gopinath K
	Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-04-03	Add OpenChat, Alpaca, Vicuna chat templates (#6397)	kaizau
	* Add openchat chat template * Add chat template test for openchat * Add chat template for vicuna * Add chat template for orca-vicuna * Add EOS for vicuna templates * Combine vicuna chat templates * Add tests for openchat and vicuna chat templates * Add chat template for alpaca * Add separate template name for vicuna-orca * Remove alpaca, match deepseek with jinja output * Regenerate chat template test with add_generation_prompt * Separate deepseek bos from system message * Match openchat template with jinja output * Remove BOS token from templates, unprefix openchat
2024-04-03	readme : update hot topics	Georgi Gerganov

2024-04-03	ggml : mul_mat_id use the same tensor for all the experts (#6387)	slaren
	* ggml : update mul_mat_id to use the same tensor for all the experts * update cuda * minor * update metal * update test-backend-ops * fix cuda * Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update convert.py * update convert-hf-to-gguf.py * update convert.py for mixtral hf models * Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * cuda : support non-pow-2 number of experts * allow quantize to work for split and merged experts models in the same way * cleanup + disable mmap automatically with split tensors models * update imatrix * test-backend-ops : test qwen argsort * update grok model loading * llama : add merged experts tensors to the grok tensor map * minor * gguf : bump version * fix quantizing of merged experts * convert-hf-to-gguf.py : update grok (untested) * make linter happy * cuda/argsort : use shared memory instead of pool memory * convert : fix grok tensor names * metal : add support for non-pow-2 argsort * llama : more loader cleanup, better error checking * cuda : fix warning * llama : still use mmap for loading old models, but copy the data to a host buffer * add review note * llama : remove ffn tensor counting + add sanity check ggml-ci * convert : fix handling of n_experts == None ggml-ci * imatrix : fix ncall counters * llama : produce error if imatrix size does not match * quantize : terminate on errors + trace logs ggml-ci * metal : pad shared memory to 16 bytes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	[SYCL] Disable iqx on windows as WA (#6435)	Meng, Hengyu
	* disable iqx on windows as WA * array instead of global_memory
2024-04-01	flake.lock: Update (#6402)	Georgi Gerganov
	Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-01	compare-llama-bench.py: fix long hexsha args (#6424)	Johannes Gäßler

2024-04-01	ci: server: verify deps are coherent with the commit (#6409)	Pierrick Hymbert
	* ci: server: verify deps are coherent with the commit * ci: server: change the ref to build as now it's a pull event target
2024-03-31	readme : update hot topics	Georgi Gerganov

2024-03-30	ci: bench: fix Resource not accessible by integration on PR event (#6393)	Pierrick Hymbert