ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-05-21	llama : add phi3 128K model support (#7225)	liuwei-git
	* add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-21	`grammars`: fix resampling logic regression (#7424)	Olivier Chafik

2024-05-21	examples: cache hf model when --model not provided (#7353)	Amir
	* examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided
2024-05-21	Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)	jaime-m-p
	* Update brute force test: add_special * Update brute force test: default values for add_bos_token and add_eos_token * Enable rtrim when pre-inserting BOS Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "server : fix test regexes"
2024-05-20	Tokenizer SPM fixes for phi-3 and llama-spm (#7375)	jaime-m-p
	* Update brute force test: special tokens * Fix added tokens - Try to read 'added_tokens.json'. - Try to read 'tokenizer_config.json'. - Try to read 'tokenizer.json'. * Fix special tokens rtrim Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix test regexes
2024-05-20	perplexity: update README FP16 results [no ci] (#7413)	Johannes Gäßler

2024-05-20	server : fix temperature + disable some tests (#7409)	Georgi Gerganov
	* server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo
2024-05-20	server : tuning tests (#7388)	Georgi Gerganov
	* server : don't pass temperature as string * server : increase timeout * tests : fix the fix 0.8f -> 0.8 ggml-ci * tests : set explicit temperature
2024-05-20	server : return error on too large embedding input (#7389)	Georgi Gerganov

2024-05-20	tests : fix --keep_split -> --keep-split (#7374)	Georgi Gerganov

2024-05-19	quantize : fix --keep-split check (#7374)	Fred Douglas

2024-05-19	server: add test for token probs (#7347)	Johannes Gäßler

2024-05-19	server: fix seed being reported back (#7382)	Johannes Gäßler

2024-05-19	cmake : update android comments (#7341)	Georgi Gerganov

2024-05-18	android : use "ci-android" branch for CI (#7341)	Georgi Gerganov
	* android : use "ci-android" branch for CI * ggml : disable SIMD exp and silu for 32-bit ARM ggml-ci * android : do not fetch, use add_subdirectory instead * cmake : provide binary dir
2024-05-18	server: correct --threads documentation [no ci] (#7362)	Johannes Gäßler

2024-05-18	perplexity : ndot progress and show stats with < 100 tasks (#7348)	strawberrymelonpanda
	Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.
2024-05-17	rpc : set SO_REUSEADDR for the server socket (#7320)	Radoslav Gerganov
	ref: #7293
2024-05-17	server : add support for the RPC backend (#7305)	Radoslav Gerganov
	ref: #7292
2024-05-17	[Server] Added --verbose option to README [no ci] (#7335)	Leon Knauer

2024-05-16	Revert "server bench: fix bench not waiting for model load (#7284)" (#7334)	Pierrick Hymbert
	This reverts commit 583fd6b000ec9ad1b465b5c98524f4a0ae388077.
2024-05-16	rpc : get available mem for the CPU backend	Radoslav Gerganov
	This can be overridden with the -m command line option ref: #7293
2024-05-16	rpc : add command line arg for specifying backend memory	Radoslav Gerganov
	ref: #7293
2024-05-16	doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288)	Vaibhav Srivastav
	* chore: add references to the quantisation space. * fix grammer lol. * Update README.md Co-authored-by: Julien Chaumond <julien@huggingface.co> * Update README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-15	ggml : tag ggml_tensor::backend as deprecated (#7290)	slaren

2024-05-15	embedding : free the batch after execution (#7297)	dm4

2024-05-15	server bench: fix bench not waiting for model load (#7284)	Johannes Gäßler

2024-05-14	server: free sampling contexts on exit (#7264)	Steve Grubb
	* server: free sampling contexts on exit This cleans up last leak found by the address sanitizer. * fix whitespace * fix whitespace
2024-05-14	Revert "move ndk code to a new library (#6951)" (#7282)	Brian
	This reverts commit efc8f767c8c8c749a245dd96ad4e2f37c164b54c.
2024-05-14	ggml : add RPC backend (#6829)	Radoslav Gerganov
	* ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos
2024-05-14	move ndk code to a new library (#6951)	Elton Kola

2024-05-14	docs: Fix typo and update description for --embeddings flag (#7026)	Ryuei
	- Change '--embedding' to '--embeddings' in the README - Update the description to match the latest --help output - Added a caution about defining physical batch size
2024-05-14	llava-cli: fix base64 prompt (#7248)	k.h.lai

2024-05-13	perplexity: add BF16 vs. FP16 results (#7150)	Johannes Gäßler

2024-05-13	change default temperature of OAI compat API from 0 to 1 (#7226)	Benjamin Findley
	* change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API
2024-05-11	fix system prompt handling (#7153)	Xuan Son Nguyen

2024-05-11	server : free llama_batch on exit (#7212)	Steve Grubb
	* [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces
2024-05-11	server: fix reported top tokens for temperature 0 (#7203)	Johannes Gäßler

2024-05-11	llama : add Jina Embeddings architecture (#6826)	Joan Fontanals
	* feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-10	llama-bench : add pp+tg test type (#7199)	slaren

2024-05-10	Fix memory bug in grammar parser (#7194)	Justine Tunney
	The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10	Main+: optionally allow special tokens from user in interactive mode (#7097)	HanishKVC
	@hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.
2024-05-10	llava : fix moondream support (#7163)	Andrei
	* Revert "Revert "llava : add support for moondream vision language model (#6899)"" This reverts commit 9da243b36ac0b9d609adfaaa4c8f1cc8c592f737. * Fix num_positions and embeddings initialization
2024-05-10	eval-callback : fix conversion to float (#7184)	slaren

2024-05-09	TypoFix (#7162)	Ahmet Zeer

2024-05-08	convert-hf : save memory with lazy evaluation (#7075)	compilade
	* convert-hf : begin refactoring write_tensor * convert : upgrade to sentencepiece v0.2.0 * convert-hf : remove unused n_dims in extra__tensors convert-hf : simplify MoE weights stacking * convert-hf : flake8 linter doesn't like semicolons * convert-hf : allow unusual model part names For example, loading `model-00001-of-00001.safetensors` now works. * convert-hf : fix stacking MoE expert tensors `torch.stack` and `torch.cat` don't do the same thing. * convert-hf : fix Mamba conversion Tested to work even with a SentencePiece-based tokenizer. * convert : use a string for the SentencePiece tokenizer path * convert-hf : display tensor shape * convert-hf : convert norms to f32 by default * convert-hf : sort model part names `os.listdir` is said to list files in arbitrary order. Sorting the file names should let "model-00009-of-00042.safetensors" be loaded before "model-00010-of-00042.safetensors". * convert-hf : use an ABC for Model again It seems Protocol can't be used as a statically type-checked ABC, because its subclasses also can't be instantiated. (why did it seem to work?) At least there's still a way to throw an error when forgetting to define the `model_arch` property of any registered Model subclasses. * convert-hf : use a plain class for Model, and forbid direct instantiation There are no abstract methods used anyway, so using ABC isn't really necessary. * convert-hf : more consistent formatting of cmdline args * convert-hf : align the message logged for converted tensors * convert-hf : fix Refact conversion * convert-hf : save memory with lazy evaluation * convert-hf : flake8 doesn't like lowercase L as a variable name * convert-hf : remove einops requirement for InternLM2 * convert-hf : faster model parts loading Instead of pre-loading them all into a dict, iterate on the tensors in the model parts progressively as needed in Model.write_tensors Conversion for some architectures relies on checking for the presence of specific tensor names, so for multi-part models, the weight map is read from the relevant json file to quickly get these names up-front. * convert-hf : minor changes for consistency * gguf-py : add tqdm as a dependency It's small, and used for a progress bar in GGUFWriter.write_tensors_to_file
2024-05-08	JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143)	Johannes Gäßler

2024-05-08	Revert "llava : add support for moondream vision language model (#6899)"	Georgi Gerganov
	This reverts commit 46e12c4692a37bdd31a0432fc5153d7d22bc7f72.
2024-05-08	server : add themes + favicon (#6848)	JohnnyB
	* Added themes support with two sample themes and a favicon. * Newline * Newline * Newline * Trailing whitespace * Increased opacity for contrast * Increase opacity. Check actions cancelled for some other priority job and I can't seem to manually re-run them, so MOAR OPACITY * Opacity action trigger. Trying to re-trigger the cancelled action. * One more opacity adjustment This Actions pipeline is failing for random issues. * Delete examples/server/themes/buttons_top/completion.js This will be served from the static string built-in to server. * Delete examples/server/themes/buttons_top/index.js This will be served from the static string built-in to server. * Delete examples/server/themes/wild/completion.js This will be served from the static string built-in to server. * Delete examples/server/themes/buttons_top/json-schema-to-grammar.mjs This will be served from the static string built-in to server. * Delete examples/server/themes/wild/index.js This will be served from the static string built-in to server. * Delete examples/server/themes/wild/json-schema-to-grammar.mjs This will be served from the static string built-in to server. * Replaced underscore.
2024-05-08	main : add --conversation / -cnv flag (#7108)	Dawid Potocki