ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-01-17	py : fix whitespace	Georgi Gerganov

2024-01-17	py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971)	Georgi Gerganov
	* py : fix missing added_tokens_dict for SPM vocab * py : pad with unknown tokens when data is missing ggml-ci * py : fix BPE vocab conversion ggml-ci * py : fix padded dummy tokens (I hope)
2024-01-17	llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-17	metal : remove unnecessary nil check (#4986)	Paul Tsochantaris

2024-01-17	llama : fix copy/paste error in llama_sampling_params comment (#4994)	David Renshaw

2024-01-16	py : remove unnecessary hasattr (#4903)	Georgi Gerganov

2024-01-16	nix: remove nixConfig from flake.nix (#4984)	Philip Taron

2024-01-16	finetune : add training data file to log message (#4979)	Daniel Bevenius
	This commit adds the name of the training data file to the log message printed when the training data is tokenized. The motivation for this change is that it can be useful to show which file is being tokenized when running the finetune example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-16	ggml : importance matrix support for legacy quants (#4969)	Kawrakow
	* imatrix: adding support for legacy quants * imatrix: guard Q4_0/Q5_0 against ffn_down craziness --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-16	examples : add complete parallel function calling example (#4974)	Maximilian Winter

2024-01-16	perplexity : fix kv cache handling for hellaswag (#4981)	Georgi Gerganov
	ggml-ci
2024-01-16	flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920)	Georgi Gerganov
	Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/34fed993f1674c8d06d58b37ce1e0fe5eebcb9f5' (2023-12-01) → 'github:hercules-ci/flake-parts/07f6395285469419cf9d078f59b5b49993198c00' (2024-01-11) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/e92039b55bcd58469325ded85d4f58dd5a4eaf58?dir=lib' (2023-11-29) → 'github:NixOS/nixpkgs/b0d36bd0a420ecee3bc916c91886caca87c894e9?dir=lib' (2023-12-30) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/cfc3698c31b1fb9cdcf10f36c9643460264d0ca8' (2023-12-27) → 'github:NixOS/nixpkgs/317484b1ead87b9c1b8ac5261a8d2dd748a0492d' (2024-01-08) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-01-16	metal : localized logic in `ggml_metal_graph_compute` (#4924)	Paul Tsochantaris
	* Metal: Localized logic in `ggml_metal_graph_compute`, minor performance improvement * Whitespace * Collecting command buffer completions on single thread * Whitespace * Reduce diff noise
2024-01-16	android : introduce starter project example (#4926)	Neuman Vong
	* Introduce starter project for Android Based on examples/llama.swiftui. * Add github workflow * Set NDK version * Only build arm64-v8a in CI * Sync bench code * Rename CI prop to skip-armeabi-v7a * Remove unused tests
2024-01-16	metal : replace loop of dispatch_async with dispatch_apply (#4934)	Alex Azarov
	* Replace loop of dispatch_async with dispatch_apply * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-16	metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (#4936)	Alex Azarov
	* metal: Log `recommendedMaxWorkingSetSize` on iOS 16+ * Only log on iOS and macOS, ignoring tvOS and other platforms * Check for Xcode version before using recommendedMaxWorkingSetSize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-16	examples : fix and improv docs for the grammar generator (#4909)	Maximilian Winter
	* Create pydantic-models-to-grammar.py * Added some comments for usage * Refactored Grammar Generator Added example and usage instruction. * Update pydantic_models_to_grammar.py * Update pydantic-models-to-grammar-examples.py * Renamed module and imported it. * Update pydantic-models-to-grammar.py * Renamed file and fixed grammar generator issue. * Fixed some issues and bugs of the grammar generator. Imporved Documentation * Update pydantic_models_to_grammar.py
2024-01-16	ggml : introduce GGML_CALL function annotation (#4850)	Justine Tunney
	This change makes it possible to build ggml-cuda.cu and ggml-metal.m as independent dynamic shared objects, that may be conditionally linked at runtime in a multiplatform binary. It introduces a GGML_CALL annotation that documents which functions have a cyclic call relationship, between the application code and GPU modules. This change does nothing, unless the build defines -DGGML_MULTIPLATFORM which causes back-references and function pointers to conform to MS ABI which is supported by NVCC, ROCm, XCode, GCC and Clang across platforms
2024-01-16	finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)	Daniel Bevenius
	This commit replaces the magic number LLAMA_FILE_MAGIC_LORA used in finetune.cpp with LLAMA_FILE_MAGIC_GGLA defined in llama.h. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-16	speculative : threading options (#4959)	stduhpf
	* speculative: expose draft threading * fix usage format * accept -td and -tbd args * speculative: revert default behavior when -td is unspecified * fix trailing whitespace
2024-01-15	pass cpu-architecture arguments only to host code (C;C++) (#4943)	ngc92

2024-01-15	llama : apply classifier-free guidance to logits directly (#4951)	David Friehs

2024-01-15	awq-py : fix typo in awq-py/README.md (#4947)	Victor Z. Peng

2024-01-15	cuda : fix dequantize kernel names (#4938)	Georgi Gerganov

2024-01-15	llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-15	CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-14	llama : fix missing quotes (#4937)	David Pflug

2024-01-14	Add ability to use importance matrix for all k-quants (#4930)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-14	llama : check LLAMA_TRACE env for extra logging (#4929)	Georgi Gerganov
	* llama : minor fix indent * llama : check LLAMA_TRACE env for extra logging ggml-ci
2024-01-14	scripts : sync-ggml-am.sh option to skip commits	Georgi Gerganov

2024-01-14	llama : use LLAMA_LOG_ macros for logging	Georgi Gerganov

2024-01-14	Fix ffn_down quantization mix for MoE models (#4927)	Kawrakow
	* Fix ffn_down quantization mix for MoE models In #4872 I did not consider the part where every third tensor is quantized with more bits. Fir MoE this leads to tensors of the same layer being quantized with different number of bits, which is not considered as a possibility in the inference implementation (it is assumed all experts use the same quantization). * Fix the fix * Review suggestion --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-14	metal : correctly set SIMD support flags on iOS (#4923)	Alex Azarov
	* Correctly set support_simdgroup_reduction and support_simdgroup_mm on iPhone/iPad * log a little bit more info on iOS
2024-01-14	llama : support WinXP build with MinGW 8.1.0 (#3419)	Karthik Kumar Viswanathan

2024-01-14	2-bit quantizations (#4897)	Kawrakow
	* imatrix: load * imatrix: WIP * imatrix: Add Q2_K quantization * imatrix: also guard against Q2_K_S quantization without importance matrix * imatrix: guard even more against low-bit quantization misuse --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-14	Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-14	sync : ggml	Georgi Gerganov

2024-01-13	ggml: cache sin/cos for RoPE (#4908)	Johannes Gäßler

2024-01-13	metal : remove old API (#4919)	Georgi Gerganov
	ggml-ci
2024-01-13	server : fix prompt caching with system prompt (#4914)	Georgi Gerganov

2024-01-13	llama : fix detokenization of non-special added-tokens (#4916)	Georgi Gerganov
	Co-authored-by: goerch <jhr.walter@t-online.de>
2024-01-13	metal : disable log for loaded kernels (#4794)	Georgi Gerganov

2024-01-13	llama : minimize size used for state save/load (#4820)	David Friehs
	* examples : save-load-state: save only required state * llama : only reserve n_vocab * n_batch at most for logits llama_decode asserts that only n_batch tokens are passed each call, and n_ctx is expected to be bigger than n_batch. * llama : always reserve n_vocab * n_batch for logits llama_context de-serialization breaks if the contexts have differing capacity for logits and llama_decode will at maximum resize to n_vocab * n_batch. * llama : only save and restore used logits for batch sizes of 512 this reduces save state in the best case by around 62 MB, which can be a lot if planning to save on each message to allow regenerating messages. * llama : use ostringstream and istringstream for save and load * llama : serialize rng into minimum amount of space required * llama : break session version due to serialization changes
2024-01-13	workflows: unbreak nix-build-aarch64, and split it out (#4915)	Someone
	The fix should be just the `sudo apt-get update`
2024-01-13	main : add parameter --no-display-prompt (#4541)	Yann Follet
	* add the parameter : --no-display-prompt , combine with --log-disable it will display only the generated tokens * remove empty line --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-13	gguf : fix potential infinite for-loop (#4600)	texmex76
	Co-authored-by: Bernhard Gstrein <gstrein@informatik.uni-freiburg.de>
2024-01-13	metal : refactor kernel loading code (#4794)	Georgi Gerganov
	* metal : detect more GPU families * metal : refactor kernel loading * metal : set kernel family requirements * metal : fix kernel init + fix compile options * metal : take into account simdgroup reduction support * metal : print only skipped kernels * metal : fix check for simdgroup reduction support * metal : check for Metal 3 * metal : free allocations * metal : normalize encoder:setComputePipelineStatus calls ggml-ci * metal : fix Metal3 family check ggml-ci * metal : check for simdgroup matrix mul. feature ggml-ci
2024-01-13	compare-llama-bench: tweak output format (#4910)	Johannes Gäßler

2024-01-13	server : fix deadlock that occurs in multi-prompt scenarios (#4905)	Ziad Ben Hadj-Alouane
	* * fix deadlock * * dont ruint all whitespace
2024-01-13	server : fix crash with multimodal models without BOS token (#4904)	makomk