summaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-10-17fix embeddings when using CUDA (#3657)slaren
2023-10-17llama : avoid fprintf in favor of LLAMA_LOG (#3538)Georgi Gerganov
2023-10-17tokenizer : special token handling (#3538)staviq
2023-10-15MPT : support GQA for replit-code-v1.5 (#3627)cebtenzzre
2023-10-13llama : remove n_threads from llama_decode_internal (#3614)Daniel Bevenius
2023-10-10Minor improvements in GPT2 tokenizer (#3567)goerch
2023-10-10llm : add bloom models (#3553)Xingchen Song(宋星辰)
2023-10-10llm : add MPT support (#3417)Jan Ploski
2023-10-09refact : fix convert script + zero out KV cache to avoid nans (#3523)Georgi Gerganov
2023-10-08sync : ggml (ggml-backend) (#3548)Georgi Gerganov
2023-10-08llama : fix missing break in Persimmon arch case statements (#3535)Kerfuffle
2023-10-07quantize : fail fast on write errors (#3521)cebtenzzre
2023-10-07llm : support Adept Persimmon 8B (#3410)Phillip Kravtsov
2023-10-07Fix for #3454 (#3455)goerch
2023-10-06kv cache slot search improvements (#3493)Kerfuffle
2023-10-06parallel : add option to load external prompt file (#3416)pudepiedj
2023-10-06llama : correct hparams comparison (#3446)l3utterfly
2023-10-04llm : add Refact model (#3329)ds5t5
2023-10-03llama : fix session saving/loading (#3400)Georgi Gerganov
2023-10-03llama : expose model's rope_freq_scale in the API (#3418)Alex Klinkhamer
2023-10-03Work on the BPE tokenizer (#3252)goerch
2023-10-02metal : set log callback before initializing (#3427)Adrian
2023-10-02infill : add new example + extend server API (#3296)vvhg1
2023-09-29llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)Cebtenzzre
2023-09-28build : enable more non-default compiler warnings (#3200)Cebtenzzre
2023-09-28llama.cpp : split llama_context_params into model and context params (#3301)slaren
2023-09-28train : finetune LORA (#2632)xaedes
2023-09-28gguf : make token scores and types optional (#3347)Cebtenzzre
2023-09-28llama : custom attention mask + parallel decoding + no context swaps (#3228)Georgi Gerganov
2023-09-27gguf : fix a few general keys (#3341)Cebtenzzre
2023-09-27metal : reusing llama.cpp logging (#3152)Rickard Hallerbäck
2023-09-21CUDA: use only 1 thread if fully offloaded (#2915)Johannes Gäßler
2023-09-20llama : allow gguf RoPE keys to be overridden with defaults (#3240)Cebtenzzre
2023-09-17llama.cpp : show model size and BPW on load (#3223)slaren
2023-09-16Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (...goerch
2023-09-15check C++ code with -Wmissing-declarations (#3184)Cebtenzzre
2023-09-15llama : add support for StarCoder model architectures (#3187)Meng Zhang
2023-09-15metal : relax conditions on fast matrix multiplication kernel (#3168)Georgi Gerganov
2023-09-14llama : make quantize example up to 2.7x faster (#3115)Cebtenzzre
2023-09-14feature : support Baichuan serial models (#3009)jameswu2014
2023-09-13whisper : tokenizer fix + re-enable tokenizer test for LLaMa (#3096)goerch
2023-09-08examples : make n_ctx warning work again (#3066)Cebtenzzre
2023-09-08build : do not use _GNU_SOURCE gratuitously (#2035)Przemysław Pawełczyk
2023-09-08enable CPU HBM (#2603)Kunshang Ji
2023-09-07fix some warnings from gcc and clang-tidy (#3038)Cebtenzzre
2023-09-07ggml : posixify madvise and pagesize (#3037)Przemysław Pawełczyk
2023-09-05llama : update logic for number of threads when using BLASGeorgi Gerganov
2023-09-05speculative : add grammar support (#2991)Georgi Gerganov
2023-09-04build : on Mac OS enable Metal by default (#2901)Georgi Gerganov
2023-09-03llama : fix bpe tokenize from byte (#2889)opparco