summaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-12-03llama : pad KV cache size (#4280)Georgi Gerganov
2023-12-01llama : avoid using "optional" keyword (#4283)Georgi Gerganov
2023-12-01llama : support optional tensors (#4283)Georgi Gerganov
2023-12-01llama : support attention bias on LLaMA architecture (#4283)CausalLM
2023-12-01llama : add Qwen support (#4281)Shijie
2023-12-01llama : fix integer overflow during quantization (#4284)Georgi Gerganov
2023-12-01ggml : add ggml_soft_max_ext (#4256)Georgi Gerganov
2023-12-01build : fix build info generation and cleanup Makefile (#3920)Jared Van Bortel
2023-11-30llama : fix alignment of general.name in print meta (#4254)Daniel Bevenius
2023-11-30llama : fix typical sampling (#4261)tarcey
2023-11-28ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offloa...Georgi Gerganov
2023-11-25llama : grammar `reserve` space in `decode_utf8` (#4210)Marcus Dunn
2023-11-24llama : set metal log callback correctly (#4204)slaren
2023-11-24ggml-cuda : support stablelm rope (#4156)slaren
2023-11-23llama : KV cache view API + better KV cache management (#4170)Georgi Gerganov
2023-11-21stablelm : simplify + speedup generation (#4153)Galunid
2023-11-19gguf-py : export chat templates (#4125)slaren
2023-11-17llama : increase max nodes (#4115)slaren
2023-11-17llama : add functions to get the model's metadata (#4013)slaren
2023-11-17llama : fix data units (#4101)Georgi Gerganov
2023-11-16Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)Kerfuffle
2023-11-15llama : restore prefix space in llama tokenizer (#4081)Jared Van Bortel
2023-11-14stablelm : StableLM support (#3586)Galunid
2023-11-13sync : ggml (backend v2) (#3912)Georgi Gerganov
2023-11-13Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)Kerfuffle
2023-11-10Unbreak persimmon after #3837 (#4010)Galunid
2023-11-07cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)Meng Zhang
2023-11-05llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)Meng Zhang
2023-11-03llama : change yarn_ext_factor placeholder to -1 (#3922)cebtenzzre
2023-11-02llm : prevent from 1-D tensors being GPU split (#3697)Georgi Gerganov
2023-11-01llama : fix llama_context_default_params after #2268 (#3893)cebtenzzre
2023-11-01llama : implement YaRN RoPE scaling (#2268)cebtenzzre
2023-11-01llm : fix llm_build_kqv taking unused tensor (benign, #3837)Georgi Gerganov
2023-11-01llm : fix falcon norm after refactoring (#3837)Georgi Gerganov
2023-11-01llm : add llm_build_context (#3881)Georgi Gerganov
2023-11-01finetune : add -ngl parameter (#3762)Andrew Godfrey
2023-11-01llama : refactor graph build code (#3837)Georgi Gerganov
2023-10-31samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841)kalomaze
2023-10-30ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861)Georgi Gerganov
2023-10-29Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)Kerfuffle
2023-10-29llama : fix kv shift bug (#3835)Georgi Gerganov
2023-10-29ggml : quantization refactoring (#3833)Georgi Gerganov
2023-10-28llama : allow quantizing k-quants to fall back when tensor size incompatible ...Kerfuffle
2023-10-28starcoder : add GPU offloading (#3827)Georgi Gerganov
2023-10-27llama : correctly report GGUFv3 format (#3818)cebtenzzre
2023-10-27cuda : improve text-generation and batched decoding performance (#3776)Georgi Gerganov
2023-10-23llama : remove token functions with `context` args in favor of `model` (#3720)Marcus Dunn
2023-10-22Add test for MPT tokenization (#3728)goerch
2023-10-22llama : validate special token ids are in range when loading GGUF model (#3635)Kerfuffle
2023-10-20sampling : refactor init to use llama_sampling_params (#3696)Georgi Gerganov