ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-11-29	examples : add readme files	Georgi Gerganov

2023-11-29	readme : add FreeChat (#4248)	Peter Sugihara

2023-11-28	ggml : restore abort() in GGML_ASSERT (#4242)	Jared Van Bortel

2023-11-28	ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full ↵	Georgi Gerganov
	offload checks in llama.cpp (#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci
2023-11-27	cmake : fix issue with version info not getting baked into LlamaConfig.cmake ↵	bandoti
	(#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option
2023-11-27	readme : add Amica to UI list (#4230)	Kasumi

2023-11-27	examples : iOS example with swift ui (#4159)	Bailey Chittle
	* copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-26	ggml : fix -Warray-bounds warning with gcc (#4231)	Jared Van Bortel

2023-11-26	lookahead : support `-n -1` infinite generation	Georgi Gerganov

2023-11-26	readme : update hot topics	Georgi Gerganov

2023-11-26	lookahead : add example for lookahead decoding (#4207)	Georgi Gerganov
	* lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren <slarengh@gmail.com>
2023-11-26	metal : fix yarn (#4220)	Xiao-Yong Jin
	get the correct n_orig_ctx in metal
2023-11-25	scripts : Use mmap in torch load (#4202)	Galunid
	* Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference
2023-11-25	llama : grammar `reserve` space in `decode_utf8` (#4210)	Marcus Dunn
	* reserve space for codepoints * improvement for the appended 0
2023-11-25	Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189)	crasm

2023-11-25	readme : update hot topics	Georgi Gerganov

2023-11-25	server : OAI API compatibility (#4198)	Georgi Gerganov
	* Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com> Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>
2023-11-24	llama : set metal log callback correctly (#4204)	slaren

2023-11-24	ggml-cuda : support stablelm rope (#4156)	slaren
	* ggml-cuda : support stablelm rope * remove unused freq_base kernel parameter * add n_dims parameter to llm_build_k_shift, default to n_rot via overload * llama : fix llm_build_k_shift args --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-24	convert : fix tensors using grad in some models (#4173)	Galunid

2023-11-24	main.swift : fix eos checking (#4197)	eastriver
	llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.
2023-11-24	readme : use PATH for Windows ROCm (#4195)	Aaryaman Vasishta
	* Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md
2023-11-23	Fix incorrect format strings and uninitialized variables. (#4133)	Haohui Mai
	* Fix incorrect format strings and uninitialized variables. * Address comments * Add the missing include statement
2023-11-23	llama : KV cache view API + better KV cache management (#4170)	Georgi Gerganov
	* llama : keep track of used KV cells + better KV cache management * llama : zero KV cache used upon clear ggml-ci * llama : allow exporting a view of the KV cache (#4180) * Allow exporting a view of the KV cache * Allow dumping the sequences per cell in common * Track max contiguous cells value and position as well * Fix max contiguous empty cells index calculation Make dump functions deal with lengths or sequences counts > 10 better * Fix off by one error in dump_kv_cache_view * Add doc comments for KV cache view functions Eliminate cell sequence struct; use llama_seq_id directly Minor cleanups * common : add -dkvc arg for enabling kv cache dumps --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-11-23	readme : update hot topics	Georgi Gerganov

2023-11-23	examples : fix typo in parallel example doc comment (#4181)	Daniel Bevenius
	Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-11-23	docs : add llama-star arch idea	Georgi Gerganov

2023-11-21	stablelm : simplify + speedup generation (#4153)	Galunid

2023-11-20	finetune - update readme to mention llama support only (#4148)	Galunid

2023-11-20	readme : update ROCm Windows instructions (#4122)	Aaryaman Vasishta
	* Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-20	main : Add ChatML functionality to main example (#4046)	Seb C
	Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>
2023-11-20	ci : add flake8 to github actions (python linting) (#4129)	Galunid
	Disabled rules: * E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned * E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned * E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned * E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard * E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned * E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned * E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard * E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard * E266 Too many leading '#' for block comment - sometimes used as "section" separator * E501 Line too long - disabled because it's broken so often it seems like a standard * E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead) * E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)
2023-11-20	speculative : fix prompt tokenization in speculative example (#4025)	Branden Butler
	* Support special tokens and not adding BOS to prompt in speculative * Adapt to new should_add_bos function * Ensure tgt and dft have same add_bos setting
2023-11-19	Revert "finetune : add --n-gpu-layers flag info to --help (#4128)"	Georgi Gerganov
	This reverts commit 05e8301e4593e2a67b4bae24f093dd12ce5cc7c2.
2023-11-19	finetune : add --n-gpu-layers flag info to --help (#4128)	Clark Saben

2023-11-19	server : relay error messages (#4131)	SoftwareRenderer

2023-11-19	common : comma should be semicolon (#4137)	kchro3

2023-11-19	gitignore : tokenize	Georgi Gerganov

2023-11-19	gguf-py : export chat templates (#4125)	slaren
	* gguf-py : export chat templates * llama.cpp : escape new lines in gguf kv info prints * gguf-py : bump version * gguf-py : check chat_template type * gguf-py : initialize chat_template
2023-11-18	tokenize example: Respect normal add BOS token behavior (#4126)	Kerfuffle
	Allow building with Makefile
2023-11-18	scripts : Remove missed baichuan convert script (#4127)	Galunid

2023-11-18	Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)	Kerfuffle
	* ggml-cuda.cu: Clean up warnings when compiling with clang * ggml-cuda.cu: Move static items into anonymous namespace * ggml-cuda.cu: Fix use of namespace start macro * Revert "ggml-cuda.cu: Fix use of namespace start macro" This reverts commit 26c11490266c096e3e5731e05270a8f73a5b2874. * Revert "ggml-cuda.cu: Move static items into anonymous namespace" This reverts commit e29757e0f7535d1ac314300f0324684cc785e06c.
2023-11-17	llama : increase max nodes (#4115)	slaren

2023-11-17	build : support ppc64le build for make and CMake (#3963)	Roger Meier
	* build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-17	tokenize : fix trailing whitespace	Georgi Gerganov

2023-11-17	examples : add tokenize (#4039)	zakkor

2023-11-17	convert : use 'model' value if it exists. This allows karpathy/tinyllamas to ↵	Don Mahurin
	load (#4089) Co-authored-by: Don Mahurin <@>
2023-11-17	py : Falcon HF compatibility (#4104)	John
	Falcon HF compatibility
2023-11-17	common : improve yaml log escaping (#4080)	Jannis Schönleber
	* logging: improve escaping in yaml output * logging: include review feedback
2023-11-17	llava : fix compilation warning that fread return value is not used (#4069)	Huawei Lin