summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-01-17py : fix whitespaceGeorgi Gerganov
2024-01-17py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971)Georgi Gerganov
2024-01-17llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996)Kawrakow
2024-01-17metal : remove unnecessary nil check (#4986)Paul Tsochantaris
2024-01-17llama : fix copy/paste error in llama_sampling_params comment (#4994)David Renshaw
2024-01-16py : remove unnecessary hasattr (#4903)Georgi Gerganov
2024-01-16nix: remove nixConfig from flake.nix (#4984)Philip Taron
2024-01-16finetune : add training data file to log message (#4979)Daniel Bevenius
2024-01-16ggml : importance matrix support for legacy quants (#4969)Kawrakow
2024-01-16examples : add complete parallel function calling example (#4974)Maximilian Winter
2024-01-16perplexity : fix kv cache handling for hellaswag (#4981)Georgi Gerganov
2024-01-16flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920)Georgi Gerganov
2024-01-16metal : localized logic in `ggml_metal_graph_compute` (#4924)Paul Tsochantaris
2024-01-16android : introduce starter project example (#4926)Neuman Vong
2024-01-16metal : replace loop of dispatch_async with dispatch_apply (#4934)Alex Azarov
2024-01-16metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (#4936)Alex Azarov
2024-01-16examples : fix and improv docs for the grammar generator (#4909)Maximilian Winter
2024-01-16ggml : introduce GGML_CALL function annotation (#4850)Justine Tunney
2024-01-16finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)Daniel Bevenius
2024-01-16speculative : threading options (#4959)stduhpf
2024-01-15pass cpu-architecture arguments only to host code (C;C++) (#4943)ngc92
2024-01-15llama : apply classifier-free guidance to logits directly (#4951)David Friehs
2024-01-15awq-py : fix typo in awq-py/README.md (#4947)Victor Z. Peng
2024-01-15cuda : fix dequantize kernel names (#4938)Georgi Gerganov
2024-01-15llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)Kawrakow
2024-01-15CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)Kawrakow
2024-01-14llama : fix missing quotes (#4937)David Pflug
2024-01-14Add ability to use importance matrix for all k-quants (#4930)Kawrakow
2024-01-14llama : check LLAMA_TRACE env for extra logging (#4929)Georgi Gerganov
2024-01-14scripts : sync-ggml-am.sh option to skip commitsGeorgi Gerganov
2024-01-14llama : use LLAMA_LOG_ macros for loggingGeorgi Gerganov
2024-01-14Fix ffn_down quantization mix for MoE models (#4927)Kawrakow
2024-01-14metal : correctly set SIMD support flags on iOS (#4923)Alex Azarov
2024-01-14llama : support WinXP build with MinGW 8.1.0 (#3419)Karthik Kumar Viswanathan
2024-01-142-bit quantizations (#4897)Kawrakow
2024-01-14Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)Kawrakow
2024-01-14sync : ggmlGeorgi Gerganov
2024-01-13ggml: cache sin/cos for RoPE (#4908)Johannes Gäßler
2024-01-13metal : remove old API (#4919)Georgi Gerganov
2024-01-13server : fix prompt caching with system prompt (#4914)Georgi Gerganov
2024-01-13llama : fix detokenization of non-special added-tokens (#4916)Georgi Gerganov
2024-01-13metal : disable log for loaded kernels (#4794)Georgi Gerganov
2024-01-13llama : minimize size used for state save/load (#4820)David Friehs
2024-01-13workflows: unbreak nix-build-aarch64, and split it out (#4915)Someone
2024-01-13main : add parameter --no-display-prompt (#4541)Yann Follet
2024-01-13gguf : fix potential infinite for-loop (#4600)texmex76
2024-01-13metal : refactor kernel loading code (#4794)Georgi Gerganov
2024-01-13compare-llama-bench: tweak output format (#4910)Johannes Gäßler
2024-01-13server : fix deadlock that occurs in multi-prompt scenarios (#4905)Ziad Ben Hadj-Alouane
2024-01-13server : fix crash with multimodal models without BOS token (#4904)makomk