ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-04-10	docs : how to add a model (#6565)	Pierrick Hymbert
	* docs: how to add a model * docs: model: typo and docs * docs: model: add prevision on RoPE * docs: model: rephrasing README.md * docs: model: rephrasing README.md * docs: model: README.md fix trailing spaces * docs : some fixes * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-10	readme : fix ROCm link (#6579)	Artem Zinnatullin

2024-04-10	readme : update UI list (#6560)	sjxx

2024-04-10	readme: fix typo in amdgpu target name (#6573)	Jiří Sejkora

2024-04-09	BERT tokenizer fixes (#6498)	Jared Van Bortel
	Key changes: * BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS * Nomic Embed conversion: pad vocab instead of slicing embedding tensor * llama_tokenize: handle added special tokens like HF does
2024-04-09	sync : ggml	Georgi Gerganov

2024-04-09	server : detect search query to start webchat (#6554)	Ed Lee

2024-04-09	llama : add Command R Plus support (#6491)	Carolinabanana
	* Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09	license : update copyright notice + add AUTHORS (#6405)	Georgi Gerganov
	* license : add AUTHORS * authors : update * scipts : add LICENSE and gen-authors.sh to sync
2024-04-08	llama : fix attention layer count sanity check (#6550)	Georgi Gerganov
	* llama : fix attention layer count sanity check * llama : fix parentheses in attention layer count sanity check There was otherwise a warning when compiling. --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-04-08	Comment explaining a decision (#6531)	kunnis

2024-04-08	quantize : fix precedence of cli args (#6541)	Georgi Gerganov

2024-04-08	llama : support negative ith in llama_get_ API (#6519)	Rick G
	* llama_sampling_sample with default args is more naively usable * Batches populated by either llama_batch_get_one or llama_batch_add work with default args * Previously get_one could use the default argument * Previously add should usually have used the last index where logits[idx] == true * This hopefully encourages the use of llama_batch_add * By giving expected results when using default arguments. * Adds "negative indexing" feature to llama_get_logits_ith and llama_get_embeddings_ith * Believed to work with any currently well behaved program * Default arg now works for both cases (previously would give strange results for add case) * Any non-negative number is unaffected and behaves as previously * Negative arguments were previously invalid. * Implemented as a special case of indexing as suggested by @compilade in https://github.com/ggerganov/llama.cpp/pull/6519 * Fixed mismatch type errors * cited in macOS CI tests * Missed in original updates based on PR feedback in https://github.com/ggerganov/llama.cpp/pull/6519
2024-04-08	llama : save and restore kv cache for single seq id (#6341)	Jan Boon
	* llama : save and restore kv cache for single seq id * remove trailing whitespace * respond error in case there's no space in the kv cache * add kv seq save restore to test case * add --slot-save-path arg to enable save restore and restrict save location * Returning 0 for some cases, instead of asserting. * cleanup error cases * rename sequence state functions * rename state get set functions * add previous function names back in with DEPRECATED notice * update doc * adjust endpoints to preferred style * fix restoring zero cell count * handle seq rm return value * unused param * keep in the size check * fix return types * add server test case for slot save restore * cleanup * add cake * cleanup style * add special * removing a whole sequence never fails * move sequence state file functionality from server to llama to match session api and add version tags * catch exceptions on save as well * error log messages * check types for stricter restore * update server doc * readme : update API changes date * strict filename validation * move include, reject bom as well * also reject empty filename * reject whitespace and trailing dot --------- Co-authored-by: Martin Evans <martindevans@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-08	remove row=1 cond (#6532)	Abhilash Majumder

2024-04-08	Adding KodiBot to UI list (#6535)	Firat
	KodiBot is free and open source ai chat app released under the GNU General Public License.
2024-04-07	Change Windows AMD example to release build to make inference much faster. ↵	Mark Fairbairn
	(#6525)
2024-04-07	flake.lock: Update (#6517)	Georgi Gerganov
	Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01) → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-07	Add GritLM as supported models. (#6513)	DAN™

2024-04-07	sync : ggml	Georgi Gerganov

2024-04-07	ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020)	Slava Primenko
	`cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1 See this issue for more details: https://github.com/ggerganov/examples/whisper/whisper.cpp/issues/2007
2024-04-07	scripts : sync ggml-cuda folder	Georgi Gerganov

2024-04-07	Run make to build the project (#6457)	limitedAtonement

2024-04-07	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, ↵	Neo Zhang Jianyu
	GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521)
2024-04-06	sync : ggml	Georgi Gerganov

2024-04-06	backend : fix typo in scheduler documentation (ggml/781)	Daniel Bevenius
	Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-06	Tests: Added integration tests for GBNF parser (#6472)	Clint Herron
	* Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. * Fixing whitespace errors and cleaning error message alert to be clearer. * Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. * Comment cleanup. * Reorganizing tests for readability. * Cleaning up debug message to make a bit more sense.
2024-04-06	ci: bench: support sse and fix prompt processing time / server: add tokens ↵	Pierrick Hymbert
	usage in stream OAI response (#6495) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate
2024-04-05	gguf.py : add licence and version to gguf writer (#6504)	Brian

2024-04-05	readme : update UI list (#6503)	Hoang Nguyen
	* Add MindMac to UI list * Update proprietary description Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-04-05	bench : make n_batch and n_ubatch configurable in Batched bench (#6500)	Ting Sun
	* bench: make n_batch and n_ubatch configurable * bench: update doc for batched bench
2024-04-05	[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464)	Ouadie EL FAROUKI
	* moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> --------- Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
2024-04-04	readme : add Dot to UI list (#6487)	alexpinel

2024-04-04	readme : fix typo (#6481)	Jun Jie

2024-04-04	server: add cURL support to server Dockerfiles (#6474)	Ed Lepedus
	* server: add cURL support to `full.Dockerfile` * server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile` * server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile` * server: add cURL support to `server-intel.Dockerfile` * server: add cURL support to `server-vulkan.Dockerfile` * fix typo in `server-vulkan.Dockerfile` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-04	ci: exempt master branch workflows from getting cancelled (#6486)	Minsoo Cheong
	* ci: exempt master branch workflows from getting cancelled * apply to bench.yml
2024-04-04	build CI: Name artifacts (#6482)	Ewout ter Hoeven
	Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).
2024-04-04	server: allow penalizing repetition of newlines on server webpage (#6431)	Shakhar Dasgupta

2024-04-04	ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478)	Pierrick Hymbert

2024-04-04	Correct README link (#6458)	limitedAtonement
	README is called README.md.
2024-04-04	ci: bench: add more ftype, fix triggers and bot comment (#6466)	Pierrick Hymbert
	* ci: bench: change trigger path to not spawn on each PR * ci: bench: add more file type for phi-2: q8_0 and f16. - do not show the comment by default * ci: bench: add seed parameter in k6 script * ci: bench: artefact name perf job * Add iteration in the commit status, reduce again the autocomment * ci: bench: add per slot metric in the commit status * Fix trailing spaces
2024-04-04	common: remove duplicate check for curl (#6471)	Daniel Bevenius
	This commit removes one of the two identical checks for curl being NULL in llama_load_model_from_url. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-04	examples : add GBNF validator program (#5948)	Clint Herron
	* Revising GBNF validator program to be much simpler. * Changing from streams to using cstdio * Adding final newline character.
2024-04-04	server : remove obsolete --memory-f32 option	Georgi Gerganov

2024-04-04	server : add option to disable KV offload (#6468)	Xiao-Yong Jin

2024-04-04	convert : fix for lint error complaining of bare except (#6470)	Clint Herron

2024-04-03	A few small fixes to server's README docs (#6428)	Fattire
	* Typo fix to server's README.md Fix minor typo ("tonen") in server README. * server readme grammar/style fixes. Quickly went through this file to look for inconsistencies in presentation of defaults, flag options, and looked for typos and grammar issues. Not perfect, but hopefully improved. * Update README.md Remove an extra space before newline.
2024-04-03	server : handle exception on wrong type in request (#6452)	JH23X
	Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net>
2024-04-03	llama : add SEA-LION support (#6448)	bryanSwk
	* initial commit for sealion support * add sealion support * minor fix * q/k ln and pos_embd only if required * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * minor : clear whitespaces --------- Co-authored-by: bryan <bryansiow@aisingapore.org> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03	ci : update checkout, setup-python and upload-artifact to latest (#6456)	Ewout ter Hoeven
	* CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4