ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-01-16	examples : fix and improv docs for the grammar generator (#4909)	Maximilian Winter
2024-01-16	ggml : introduce GGML_CALL function annotation (#4850)	Justine Tunney
2024-01-16	finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)	Daniel Bevenius
2024-01-16	speculative : threading options (#4959)	stduhpf
2024-01-15	pass cpu-architecture arguments only to host code (C;C++) (#4943)	ngc92
2024-01-15	llama : apply classifier-free guidance to logits directly (#4951)	David Friehs
2024-01-15	awq-py : fix typo in awq-py/README.md (#4947)	Victor Z. Peng
2024-01-15	cuda : fix dequantize kernel names (#4938)	Georgi Gerganov
2024-01-15	llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)	Kawrakow
2024-01-15	CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)	Kawrakow
2024-01-14	llama : fix missing quotes (#4937)	David Pflug
2024-01-14	Add ability to use importance matrix for all k-quants (#4930)	Kawrakow
2024-01-14	llama : check LLAMA_TRACE env for extra logging (#4929)	Georgi Gerganov
2024-01-14	scripts : sync-ggml-am.sh option to skip commits	Georgi Gerganov
2024-01-14	llama : use LLAMA_LOG_ macros for logging	Georgi Gerganov
2024-01-14	Fix ffn_down quantization mix for MoE models (#4927)	Kawrakow
2024-01-14	metal : correctly set SIMD support flags on iOS (#4923)	Alex Azarov
2024-01-14	llama : support WinXP build with MinGW 8.1.0 (#3419)	Karthik Kumar Viswanathan
2024-01-14	2-bit quantizations (#4897)	Kawrakow
2024-01-14	Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)	Kawrakow
2024-01-14	sync : ggml	Georgi Gerganov
2024-01-13	ggml: cache sin/cos for RoPE (#4908)	Johannes Gäßler
2024-01-13	metal : remove old API (#4919)	Georgi Gerganov
2024-01-13	server : fix prompt caching with system prompt (#4914)	Georgi Gerganov
2024-01-13	llama : fix detokenization of non-special added-tokens (#4916)	Georgi Gerganov
2024-01-13	metal : disable log for loaded kernels (#4794)	Georgi Gerganov
2024-01-13	llama : minimize size used for state save/load (#4820)	David Friehs
2024-01-13	workflows: unbreak nix-build-aarch64, and split it out (#4915)	Someone
2024-01-13	main : add parameter --no-display-prompt (#4541)	Yann Follet
2024-01-13	gguf : fix potential infinite for-loop (#4600)	texmex76
2024-01-13	metal : refactor kernel loading code (#4794)	Georgi Gerganov
2024-01-13	compare-llama-bench: tweak output format (#4910)	Johannes Gäßler
2024-01-13	server : fix deadlock that occurs in multi-prompt scenarios (#4905)	Ziad Ben Hadj-Alouane
2024-01-13	server : fix crash with multimodal models without BOS token (#4904)	makomk
2024-01-13	convert : update phi-2 to latest HF repo (#4903)	Georgi Gerganov
2024-01-12	sync : ggml	Georgi Gerganov
2024-01-12	ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758)	Georgi Gerganov
2024-01-12	backend_sched : fix assignments	slaren
2024-01-12	examples : add pydantic models to GBNF grammar generator (#4883)	Maximilian Winter
2024-01-12	CUDA: faster q8_0 -> f16 dequantization (#4895)	Johannes Gäßler
2024-01-12	llama : ggml-backend integration (#4766)	slaren
2024-01-12	llama : remove redundant assert for StableLM (#4901)	Georgi Gerganov
2024-01-12	export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894)	Daniel Bevenius
2024-01-12	llama.swiftui : update models layout (#4826)	Zay
2024-01-12	gitignore : imatrix	Georgi Gerganov
2024-01-12	CUDA: fix softmax compile for old CUDA versions (#4862)	Johannes Gäßler
2024-01-12	llama : fix typo "imp_embd" -> "inp_embd"	Georgi Gerganov
2024-01-12	common : streamline the formatting of help (#4890)	howlger
2024-01-12	py : fix lint (#4889)	Georgi Gerganov
2024-01-12	llama : fix llm_build_k_shift to use correct n_rot (#4889)	Georgi Gerganov