summaryrefslogtreecommitdiff
path: root/llama.h
AgeCommit message (Expand)Author
2024-04-11grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses...Olivier Chafik
2024-04-09BERT tokenizer fixes (#6498)Jared Van Bortel
2024-04-08llama : support negative ith in llama_get_ API (#6519)Rick G
2024-04-08llama : save and restore kv cache for single seq id (#6341)Jan Boon
2024-04-04examples : add GBNF validator program (#5948)Clint Herron
2024-03-28convert : refactor vocab selection logic (#6355)Jared Van Bortel
2024-03-26llama : greatly reduce output buffer memory usage (#6122)compilade
2024-03-26IQ1_M: 1.75 bpw quantization (#6302)Kawrakow
2024-03-26quantize : be able to override metadata by key (#6321)Kawrakow
2024-03-22quantize: options for output and token embedding tensors qtype (#6239)Kawrakow
2024-03-22llama_model_loader: support multiple split/shard GGUFs (#6187)Pierrick Hymbert
2024-03-15llama : add support for control vectors (#5970)Theia Vogel
2024-03-14llama : support models without vocabulary (#5798)Michael Podvitskiy
2024-03-13llama : add pipeline parallelism support (#6017)slaren
2024-03-11llama : more consistent names of count variables (#5994)Georgi Gerganov
2024-03-11llama : fix F16/F32 downcast + improve names (#5980)Georgi Gerganov
2024-03-10llama : add support for GritLM (#5959)DAN™
2024-03-08llama : support Mamba Selective State Space Models (#5328)compilade
2024-03-04llama : fix embeddings (#5796)Georgi Gerganov
2024-03-03llama : allow for user specified embedding pooling type (#5849)Douglas Hanley
2024-03-02llama : add abort_callback to interrupt computation (#5409)Michael Podvitskiy
2024-03-01llama : cleanup unused mmq flags (#5772)Pierrick Hymbert
2024-02-29llama : constified `llama_set_state_data`'s `src` (#5774)Marcus Dunn
2024-02-28llama : remove deprecated API (#5770)Georgi Gerganov
2024-02-27IQ4_XS: a 4.25 bpw quantization (#5747)Kawrakow
2024-02-27llama : fix defrag bugs + add parameter (#5735)Georgi Gerganov
2024-02-26Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range...Kawrakow
2024-02-25llama : refactor k-shift implementation + KV defragmentation (#5691)Georgi Gerganov
2024-02-25code : normalize enum names (#5697)Georgi Gerganov
2024-02-24IQ3_S: a much better alternative to Q3_K (#5676)Kawrakow
2024-02-22Add docs for llama_chat_apply_template (#5645)Xuan Son Nguyen
2024-02-21IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)Kawrakow
2024-02-19llama : add llama_chat_apply_template() (#5538)Xuan Son Nguyen
2024-02-181.5 bit quantization (#5453)Kawrakow
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-15Use correct type of pooling for embedding models (#5500)Douglas Hanley
2024-02-13llama : support batched embeddings (#5466)Douglas Hanley
2024-02-11Add support for BERT embedding models (#5423)Douglas Hanley
2024-02-03YaRN : store rope scaling type as int32_t in memory (#5285)Jared Van Bortel
2024-01-31llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)Georgi Gerganov
2024-01-30SOTA 3-bit quants (#5196)Kawrakow
2024-01-29Nomic Vulkan backend (#4456)Jared Van Bortel
2024-01-28ggml : add Vulkan backend (#2059)0cc4m
2024-01-28ggml : add unified SYCL backend for Intel GPUs (#2690)Abhilash Majumder
2024-01-25llama : dynamic temperature sampling (#4972)l3utterfly
2024-01-22llama : add Q3_K_XS (#5060)Kawrakow
2024-01-17backend : add eval callback (#4935)Georgi Gerganov
2024-01-15llama : apply classifier-free guidance to logits directly (#4951)David Friehs
2024-01-142-bit quantizations (#4897)Kawrakow
2024-01-13llama : minimize size used for state save/load (#4820)David Friehs