summaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-12-21llama : initial ggml-backend integration (#4520)slaren
2023-12-21llama : allow getting n_batch from llama_context in c api (#4540)Marcus Dunn
2023-12-21llama : disable per-tensor info prints on model load (#4562)Johannes Gäßler
2023-12-18llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)Ebey Abraham
2023-12-18llama : fix try_override for bool_value which always return true (#4519)hankcs
2023-12-17decode : fix logits_valid for legacy API (#4516)Jared Van Bortel
2023-12-17llama.swiftui : add bench functionality (#4483)Georgi Gerganov
2023-12-16lora : add support for non-llama models (#3333)slaren
2023-12-15llama : sanity checks for access to logits (#4274)Jared Van Bortel
2023-12-14ggml : remove n_dims from ggml_tensor (#4469)slaren
2023-12-14ggml : add ggml_row_size() (fixes llama out of space) (#4461)LostRuins
2023-12-13llama : add Mixtral support (#4406)slaren
2023-12-12english : use `typos` to fix comments and logs (#4354)Richard Kiss
2023-12-09grammar : revert the replacement of llama_token_to_piece with id_to_token (#4...Xiang (Kevin) Li
2023-12-07llama : per-layer KV cache + quantum K cache (#4309)Georgi Gerganov
2023-12-05grammar : pre-computed pieces + reserve mem + less string copies (#4330)Marcus Dunn
2023-12-05llama : allow overriding GGUF metadata when loading model (#4092)Kerfuffle
2023-12-03llama : pad KV cache size (#4280)Georgi Gerganov
2023-12-01llama : avoid using "optional" keyword (#4283)Georgi Gerganov
2023-12-01llama : support optional tensors (#4283)Georgi Gerganov
2023-12-01llama : support attention bias on LLaMA architecture (#4283)CausalLM
2023-12-01llama : add Qwen support (#4281)Shijie
2023-12-01llama : fix integer overflow during quantization (#4284)Georgi Gerganov
2023-12-01ggml : add ggml_soft_max_ext (#4256)Georgi Gerganov
2023-12-01build : fix build info generation and cleanup Makefile (#3920)Jared Van Bortel
2023-11-30llama : fix alignment of general.name in print meta (#4254)Daniel Bevenius
2023-11-30llama : fix typical sampling (#4261)tarcey
2023-11-28ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offloa...Georgi Gerganov
2023-11-25llama : grammar `reserve` space in `decode_utf8` (#4210)Marcus Dunn
2023-11-24llama : set metal log callback correctly (#4204)slaren
2023-11-24ggml-cuda : support stablelm rope (#4156)slaren
2023-11-23llama : KV cache view API + better KV cache management (#4170)Georgi Gerganov
2023-11-21stablelm : simplify + speedup generation (#4153)Galunid
2023-11-19gguf-py : export chat templates (#4125)slaren
2023-11-17llama : increase max nodes (#4115)slaren
2023-11-17llama : add functions to get the model's metadata (#4013)slaren
2023-11-17llama : fix data units (#4101)Georgi Gerganov
2023-11-16Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)Kerfuffle
2023-11-15llama : restore prefix space in llama tokenizer (#4081)Jared Van Bortel
2023-11-14stablelm : StableLM support (#3586)Galunid
2023-11-13sync : ggml (backend v2) (#3912)Georgi Gerganov
2023-11-13Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)Kerfuffle
2023-11-10Unbreak persimmon after #3837 (#4010)Galunid
2023-11-07cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)Meng Zhang
2023-11-05llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)Meng Zhang
2023-11-03llama : change yarn_ext_factor placeholder to -1 (#3922)cebtenzzre
2023-11-02llm : prevent from 1-D tensors being GPU split (#3697)Georgi Gerganov
2023-11-01llama : fix llama_context_default_params after #2268 (#3893)cebtenzzre
2023-11-01llama : implement YaRN RoPE scaling (#2268)cebtenzzre
2023-11-01llm : fix llm_build_kqv taking unused tensor (benign, #3837)Georgi Gerganov