ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-01-19	winogrande: evaluate log-probs in parallel (#5036)	Kawrakow
	This is a relatively minor performance tweak resulting in ~10% speedup on my system. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-19	llama : add CodeShell support (#5016)	chiranko
	* llama: add codeshell support * llama.cpp: fix codeshell with NeoX rope Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-19	perplexity: avoid unnecessary alloocations and logit copies (#5035)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-19	perplexity : faster Winogrande via batching (#5024)	Georgi Gerganov
	* perplexity : faster Winogrande via batching ggml-ci * perplexity : remove unused function * perplexity : only tokenize selected tasks for Winogrande
2024-01-19	llama : fix falcon arch for tied output embeddings (#4978)	John
	* falcon arch fix for tied output embeddings * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-18	cmake : add ggml public headers (#5011)	Georgi Gerganov

2024-01-18	server : defer tasks when "slot unavailable" (#5018)	Xuan Son Nguyen
	* server: defer task when no slot is available * remove unnecessary log --------- Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
2024-01-18	llama : fix mlock with no-mmap with Metal (#5025)	slaren

2024-01-18	imatrix : fix assert for src0 non-cont check	Georgi Gerganov

2024-01-18	perplexity : fix winogrande N tasks option	Georgi Gerganov

2024-01-18	scripts : add get-winogrande.sh	Georgi Gerganov

2024-01-18	convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019)	David Sommers
	PR #4818 (merged last week) reintroduced a config check for vocab_size that was addressed in PR #4258 (merged 2023-11-30). Without the fix, llama2 models can't be converted. The error is: `ValueError: The model's vocab size is set to -1 in params.json. Please update it manually. Maybe 32000?`
2024-01-18	HellaSwag: speed up by parallelizing log-prob evaluation (#5020)	Kawrakow
	For Mistral-7B and fp16, time on my system goes down from 536 seconds to 423 seconds for the full evaluation dataset (10042 tasks). Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-18	perplexity : faster HellaSwag via batching (#5017)	Georgi Gerganov
	* perplexity : faster HellaSwag ggml-ci * perplexity : clean-up ggml-ci * perplexity : no need for decode_helper ggml-ci * perplexity : add comments * perplexity : option to specify max batched tasks via `n_parallel` * perplexity : remove HellaSwag restruction for n_batch
2024-01-18	Add Winogrande evaluation (#5015)	Kawrakow
	* winogrande: simple implementation It doesn't look like it is working - why? For Mistral-7B it is barely better than random chance (score ~60% for 1267 tasks), while I see Mistral-7B scoring 78.4% on the HF leader board. 1-sigma statistical uncertainty for 1267 tasks is ~1.4, so no way the difference is due to statistics. * winogrande: somewhat better Score for Mistrali7-B is now 68.9 on the validation set of winogrande_debiased. Still far from the reported 78.4, but better than what I had before. * winogrande: improving Mistral-7B score is now 73.56. Still not quite 78.4 but getting there. We are also getting a lower score on HellaSwag compared to HF leader board, so I'm not expecting we will get up to 78.4 anyway. It looks like it is better to skip the choice word(s) when evaluating the average log-likelihood. This kind of makes sense because a more common word (in Winogrande this is often a name) will have a higher probability without knowing about the follow up context, and this will skew the log-likelihood towards the more common word. We can only do this if the choice words are not last in the sentence. It also looks like it is better to skip the punctuation at the end of the sentence, provided the choice words are not last. * winogrande: add dataset instructions --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-18	scritps : add helper script to get hellaswag data in txt format	Georgi Gerganov

2024-01-18	metal : fix memory leak, dangling pointer and unused autorel (#5007)	Paul Tsochantaris
	* Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute * SPM header potential fix * Reverting symlinks
2024-01-17	sync : ggml	Georgi Gerganov

2024-01-17	ggml : add IQ2 to test-backend-ops + refactoring (#4990)	Georgi Gerganov
	* ggml : add IQ2 to test-backend-ops + refactoring ggml-ci * cuda : update supports_op for IQ2 ggml-ci * ci : enable LLAMA_CUBLAS=1 for CUDA nodes ggml-ci * cuda : fix out-of-bounds-access in `mul_mat_vec_q` ggml-ci * tests : avoid creating RNGs for each Q tensor ggml-ci * tests : avoid creating RNGs for each tensor ggml-ci
2024-01-17	imatrix : offload to GPU support (#4957)	Georgi Gerganov
	* backend : add eval callback ggml-ci * backend : group nodes in a single compute when user don't need them * backend : clean-up the implementation ggml-ci * simple : do not perform tensor data copy if not needed * simple : fix * imatrix : offload to GPU support * imatrix : fix ggml_mul_mat_id hanlding ggml-ci * ci : add imatrix test ggml-ci * ci : rearrange output ggml-ci
2024-01-17	backend : add eval callback (#4935)	Georgi Gerganov
	* backend : add eval callback ggml-ci * backend : group nodes in a single compute when user don't need them * backend : clean-up the implementation ggml-ci * simple : do not perform tensor data copy if not needed * simple : fix * simple : no need for ggml_is_contiguous + fix bool parse * llama : fix callback placement in llama_context_params * backend : avoid double-ask callback calls * simple : restore examples, imatrix will serve as a demo
2024-01-17	metal : create autorelease pool during library build (#4970)	Georgi Gerganov
	* metal : create autorelease pool during library build ggml-ci * test : simplify ggml-ci
2024-01-17	py : fix whitespace	Georgi Gerganov

2024-01-17	py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971)	Georgi Gerganov
	* py : fix missing added_tokens_dict for SPM vocab * py : pad with unknown tokens when data is missing ggml-ci * py : fix BPE vocab conversion ggml-ci * py : fix padded dummy tokens (I hope)
2024-01-17	llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-17	metal : remove unnecessary nil check (#4986)	Paul Tsochantaris

2024-01-17	llama : fix copy/paste error in llama_sampling_params comment (#4994)	David Renshaw

2024-01-16	py : remove unnecessary hasattr (#4903)	Georgi Gerganov

2024-01-16	nix: remove nixConfig from flake.nix (#4984)	Philip Taron

2024-01-16	finetune : add training data file to log message (#4979)	Daniel Bevenius
	This commit adds the name of the training data file to the log message printed when the training data is tokenized. The motivation for this change is that it can be useful to show which file is being tokenized when running the finetune example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-16	ggml : importance matrix support for legacy quants (#4969)	Kawrakow
	* imatrix: adding support for legacy quants * imatrix: guard Q4_0/Q5_0 against ffn_down craziness --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-16	examples : add complete parallel function calling example (#4974)	Maximilian Winter

2024-01-16	perplexity : fix kv cache handling for hellaswag (#4981)	Georgi Gerganov
	ggml-ci
2024-01-16	flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920)	Georgi Gerganov
	Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/34fed993f1674c8d06d58b37ce1e0fe5eebcb9f5' (2023-12-01) → 'github:hercules-ci/flake-parts/07f6395285469419cf9d078f59b5b49993198c00' (2024-01-11) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/e92039b55bcd58469325ded85d4f58dd5a4eaf58?dir=lib' (2023-11-29) → 'github:NixOS/nixpkgs/b0d36bd0a420ecee3bc916c91886caca87c894e9?dir=lib' (2023-12-30) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/cfc3698c31b1fb9cdcf10f36c9643460264d0ca8' (2023-12-27) → 'github:NixOS/nixpkgs/317484b1ead87b9c1b8ac5261a8d2dd748a0492d' (2024-01-08) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-01-16	metal : localized logic in `ggml_metal_graph_compute` (#4924)	Paul Tsochantaris
	* Metal: Localized logic in `ggml_metal_graph_compute`, minor performance improvement * Whitespace * Collecting command buffer completions on single thread * Whitespace * Reduce diff noise
2024-01-16	android : introduce starter project example (#4926)	Neuman Vong
	* Introduce starter project for Android Based on examples/llama.swiftui. * Add github workflow * Set NDK version * Only build arm64-v8a in CI * Sync bench code * Rename CI prop to skip-armeabi-v7a * Remove unused tests
2024-01-16	metal : replace loop of dispatch_async with dispatch_apply (#4934)	Alex Azarov
	* Replace loop of dispatch_async with dispatch_apply * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-16	metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (#4936)	Alex Azarov
	* metal: Log `recommendedMaxWorkingSetSize` on iOS 16+ * Only log on iOS and macOS, ignoring tvOS and other platforms * Check for Xcode version before using recommendedMaxWorkingSetSize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-16	examples : fix and improv docs for the grammar generator (#4909)	Maximilian Winter
	* Create pydantic-models-to-grammar.py * Added some comments for usage * Refactored Grammar Generator Added example and usage instruction. * Update pydantic_models_to_grammar.py * Update pydantic-models-to-grammar-examples.py * Renamed module and imported it. * Update pydantic-models-to-grammar.py * Renamed file and fixed grammar generator issue. * Fixed some issues and bugs of the grammar generator. Imporved Documentation * Update pydantic_models_to_grammar.py
2024-01-16	ggml : introduce GGML_CALL function annotation (#4850)	Justine Tunney
	This change makes it possible to build ggml-cuda.cu and ggml-metal.m as independent dynamic shared objects, that may be conditionally linked at runtime in a multiplatform binary. It introduces a GGML_CALL annotation that documents which functions have a cyclic call relationship, between the application code and GPU modules. This change does nothing, unless the build defines -DGGML_MULTIPLATFORM which causes back-references and function pointers to conform to MS ABI which is supported by NVCC, ROCm, XCode, GCC and Clang across platforms
2024-01-16	finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)	Daniel Bevenius
	This commit replaces the magic number LLAMA_FILE_MAGIC_LORA used in finetune.cpp with LLAMA_FILE_MAGIC_GGLA defined in llama.h. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-16	speculative : threading options (#4959)	stduhpf
	* speculative: expose draft threading * fix usage format * accept -td and -tbd args * speculative: revert default behavior when -td is unspecified * fix trailing whitespace
2024-01-15	pass cpu-architecture arguments only to host code (C;C++) (#4943)	ngc92

2024-01-15	llama : apply classifier-free guidance to logits directly (#4951)	David Friehs

2024-01-15	awq-py : fix typo in awq-py/README.md (#4947)	Victor Z. Peng

2024-01-15	cuda : fix dequantize kernel names (#4938)	Georgi Gerganov

2024-01-15	llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-15	CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-14	llama : fix missing quotes (#4937)	David Pflug

2024-01-14	Add ability to use importance matrix for all k-quants (#4930)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>