summaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-10-28starcoder : add GPU offloading (#3827)Georgi Gerganov
2023-10-27llama : correctly report GGUFv3 format (#3818)cebtenzzre
2023-10-27cuda : improve text-generation and batched decoding performance (#3776)Georgi Gerganov
2023-10-23llama : remove token functions with `context` args in favor of `model` (#3720)Marcus Dunn
2023-10-22Add test for MPT tokenization (#3728)goerch
2023-10-22llama : validate special token ids are in range when loading GGUF model (#3635)Kerfuffle
2023-10-20sampling : refactor init to use llama_sampling_params (#3696)Georgi Gerganov
2023-10-20ggml : fix rope + llama minor optimizations (#3560)Herman Semenov
2023-10-18speculative : add tree-based sampling example (#3624)Georgi Gerganov
2023-10-17fix embeddings when using CUDA (#3657)slaren
2023-10-17llama : avoid fprintf in favor of LLAMA_LOG (#3538)Georgi Gerganov
2023-10-17tokenizer : special token handling (#3538)staviq
2023-10-15MPT : support GQA for replit-code-v1.5 (#3627)cebtenzzre
2023-10-13llama : remove n_threads from llama_decode_internal (#3614)Daniel Bevenius
2023-10-10Minor improvements in GPT2 tokenizer (#3567)goerch
2023-10-10llm : add bloom models (#3553)Xingchen Song(宋星辰)
2023-10-10llm : add MPT support (#3417)Jan Ploski
2023-10-09refact : fix convert script + zero out KV cache to avoid nans (#3523)Georgi Gerganov
2023-10-08sync : ggml (ggml-backend) (#3548)Georgi Gerganov
2023-10-08llama : fix missing break in Persimmon arch case statements (#3535)Kerfuffle
2023-10-07quantize : fail fast on write errors (#3521)cebtenzzre
2023-10-07llm : support Adept Persimmon 8B (#3410)Phillip Kravtsov
2023-10-07Fix for #3454 (#3455)goerch
2023-10-06kv cache slot search improvements (#3493)Kerfuffle
2023-10-06parallel : add option to load external prompt file (#3416)pudepiedj
2023-10-06llama : correct hparams comparison (#3446)l3utterfly
2023-10-04llm : add Refact model (#3329)ds5t5
2023-10-03llama : fix session saving/loading (#3400)Georgi Gerganov
2023-10-03llama : expose model's rope_freq_scale in the API (#3418)Alex Klinkhamer
2023-10-03Work on the BPE tokenizer (#3252)goerch
2023-10-02metal : set log callback before initializing (#3427)Adrian
2023-10-02infill : add new example + extend server API (#3296)vvhg1
2023-09-29llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)Cebtenzzre
2023-09-28build : enable more non-default compiler warnings (#3200)Cebtenzzre
2023-09-28llama.cpp : split llama_context_params into model and context params (#3301)slaren
2023-09-28train : finetune LORA (#2632)xaedes
2023-09-28gguf : make token scores and types optional (#3347)Cebtenzzre
2023-09-28llama : custom attention mask + parallel decoding + no context swaps (#3228)Georgi Gerganov
2023-09-27gguf : fix a few general keys (#3341)Cebtenzzre
2023-09-27metal : reusing llama.cpp logging (#3152)Rickard Hallerbäck
2023-09-21CUDA: use only 1 thread if fully offloaded (#2915)Johannes Gäßler
2023-09-20llama : allow gguf RoPE keys to be overridden with defaults (#3240)Cebtenzzre
2023-09-17llama.cpp : show model size and BPW on load (#3223)slaren
2023-09-16Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (...goerch
2023-09-15check C++ code with -Wmissing-declarations (#3184)Cebtenzzre
2023-09-15llama : add support for StarCoder model architectures (#3187)Meng Zhang
2023-09-15metal : relax conditions on fast matrix multiplication kernel (#3168)Georgi Gerganov
2023-09-14llama : make quantize example up to 2.7x faster (#3115)Cebtenzzre
2023-09-14feature : support Baichuan serial models (#3009)jameswu2014
2023-09-13whisper : tokenizer fix + re-enable tokenizer test for LLaMa (#3096)goerch