summaryrefslogtreecommitdiff
path: root/llama.h
AgeCommit message (Expand)Author
2024-06-21llama : allow pooled embeddings on any model (#7477)Douglas Hanley
2024-06-14convert : add Poro-34B-chat tokenizer support (#7713)Elaine
2024-06-06Added support for . (any character) token in grammar engine. (#6467)Clint Herron
2024-06-05Fix per token atrributes bits (#7749)jaime-m-p
2024-06-04llama : remove beam search (#7736)Georgi Gerganov
2024-06-04Per token attributes (#7685)jaime-m-p
2024-05-31llama : cache llama_token_to_piece (#7587)Georgi Gerganov
2024-05-27llama : add comments about experimental flags (#7544)Georgi Gerganov
2024-05-26llama : add Smaug 70B support (#7402)Bartowski
2024-05-25main : don't print special tokens with --grammar (#6923)Justine Tunney
2024-05-23llama : add getters for n_threads/n_threads_batch (#7464)Daniel Bevenius
2024-05-19Add StableLM2 pre-tokenizer (#7349)Anas Ahouzi
2024-05-14ggml : add RPC backend (#6829)Radoslav Gerganov
2024-05-08llama : add BPE pre-tokenization for Qwen2 (#7114)Ren Xuancheng
2024-05-08convert : add BPE pre-tokenization for DBRX (#7132)DAN™
2024-05-08ggml : introduce bfloat16 support (#6412)Justine Tunney
2024-05-07Fix OLMo HF to GGUF conversion (#6910)nopperl
2024-05-05command-r : add BPE pre-tokenization (#7063)DAN™
2024-05-04tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)Georgi Gerganov
2024-05-03llama : rename ctx to user_data in progress_callback (#7045)Daniel Bevenius
2024-04-30ggml : add Flash Attention (#5021)Georgi Gerganov
2024-04-29llama : fix BPE pre-tokenization (#6920)Georgi Gerganov
2024-04-26quantize: add imatrix and dataset metadata in GGUF (#6658)Pierrick Hymbert
2024-04-26add basic tensor data validation function (#6884)slaren
2024-04-25quantize : add '--keep-split' to quantize model into shards (#6688)jiez
2024-04-24llama : add llama_get_pooling_type function (#6862)Douglas Hanley
2024-04-24Server: fix seed for multiple slots (#6835)Johannes Gäßler
2024-04-21llama : add option to render special/control tokens (#6807)Georgi Gerganov
2024-04-21llama : support Llama 3 HF conversion (#6745)Pedro Cuenca
2024-04-11grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses...Olivier Chafik
2024-04-09BERT tokenizer fixes (#6498)Jared Van Bortel
2024-04-08llama : support negative ith in llama_get_ API (#6519)Rick G
2024-04-08llama : save and restore kv cache for single seq id (#6341)Jan Boon
2024-04-04examples : add GBNF validator program (#5948)Clint Herron
2024-03-28convert : refactor vocab selection logic (#6355)Jared Van Bortel
2024-03-26llama : greatly reduce output buffer memory usage (#6122)compilade
2024-03-26IQ1_M: 1.75 bpw quantization (#6302)Kawrakow
2024-03-26quantize : be able to override metadata by key (#6321)Kawrakow
2024-03-22quantize: options for output and token embedding tensors qtype (#6239)Kawrakow
2024-03-22llama_model_loader: support multiple split/shard GGUFs (#6187)Pierrick Hymbert
2024-03-15llama : add support for control vectors (#5970)Theia Vogel
2024-03-14llama : support models without vocabulary (#5798)Michael Podvitskiy
2024-03-13llama : add pipeline parallelism support (#6017)slaren
2024-03-11llama : more consistent names of count variables (#5994)Georgi Gerganov
2024-03-11llama : fix F16/F32 downcast + improve names (#5980)Georgi Gerganov
2024-03-10llama : add support for GritLM (#5959)DAN™
2024-03-08llama : support Mamba Selective State Space Models (#5328)compilade
2024-03-04llama : fix embeddings (#5796)Georgi Gerganov
2024-03-03llama : allow for user specified embedding pooling type (#5849)Douglas Hanley
2024-03-02llama : add abort_callback to interrupt computation (#5409)Michael Podvitskiy