summaryrefslogtreecommitdiff
path: root/common/common.h
AgeCommit message (Expand)Author
2024-03-25examples : add "retrieval" (#6193)Minsoo Cheong
2024-03-23common: llama_load_model_from_url split support (#6192)Pierrick Hymbert
2024-03-23lookup: complement data from context with general text statistics (#5479)Johannes Gäßler
2024-03-22common : add HF arg helpers (#6234)Georgi Gerganov
2024-03-22server : enable continuous batching by default (#6231)Georgi Gerganov
2024-03-17common: llama_load_model_from_url using --model-url (#6098)Pierrick Hymbert
2024-03-15llama : add support for control vectors (#5970)Theia Vogel
2024-03-14embedding : print cosine similarity (#899)Georgi Gerganov
2024-03-13llama : add pipeline parallelism support (#6017)slaren
2024-03-09server : normalize embeddings (#5956)SeungWon Jeong
2024-03-04speculative : implement stochastic speculative sampling (#5625)Minsoo Cheong
2024-03-04common : use LLAMA_DEFAULT_SEED (#5855)DAN™
2024-03-03llama : allow for user specified embedding pooling type (#5849)Douglas Hanley
2024-03-01llama : cleanup unused mmq flags (#5772)Pierrick Hymbert
2024-02-27llama : fix defrag bugs + add parameter (#5735)Georgi Gerganov
2024-02-25code : normalize enum names (#5697)Georgi Gerganov
2024-02-16server : add "samplers" param to control the samplers order (#5494)Alexey Parfenov
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-11common : use enums for sampler types (#5418)Alexey Parfenov
2024-02-03YaRN : store rope scaling type as int32_t in memory (#5285)Jared Van Bortel
2024-01-31llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)Georgi Gerganov
2024-01-22KL-divergence (#5076)Kawrakow
2024-01-21Add ability to evauate multiple choice tasks (#5047)Kawrakow
2024-01-18Add Winogrande evaluation (#5015)Kawrakow
2024-01-16speculative : threading options (#4959)stduhpf
2024-01-13main : add parameter --no-display-prompt (#4541)Yann Follet
2024-01-12llama : ggml-backend integration (#4766)slaren
2024-01-11main : better name for variable n_print (#4874)Georgi Gerganov
2024-01-11main : disable token count by default (#4874)Georgi Gerganov
2024-01-11main : print total token count and tokens consumed so far (#4874)pudepiedj
2024-01-08main : add self-extend support (#4815)Georgi Gerganov
2023-12-22lookup : add prompt lookup decoding example (#4484)LeonEricsson
2023-12-07llama : per-layer KV cache + quantum K cache (#4309)Georgi Gerganov
2023-12-05llama : allow overriding GGUF metadata when loading model (#4092)Kerfuffle
2023-12-05sampling : custom samplers order (#4285)MaggotHATE
2023-11-23llama : KV cache view API + better KV cache management (#4170)Georgi Gerganov
2023-11-20main : Add ChatML functionality to main example (#4046)Seb C
2023-11-16Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)Kerfuffle
2023-11-03speculative : change default p_accept to 0.5 + CLI args (#3919)Georgi Gerganov
2023-11-03common : YAYF (yet another YARN fix) (#3925)Georgi Gerganov
2023-11-02build : link against build info instead of compiling against it (#3879)cebtenzzre
2023-11-01llama : implement YaRN RoPE scaling (#2268)cebtenzzre
2023-11-01common : allow caller to handle help/argument exceptions (#3715)bandoti
2023-10-20sampling : refactor init to use llama_sampling_params (#3696)Georgi Gerganov
2023-10-18speculative : add tree-based sampling example (#3624)Georgi Gerganov
2023-10-17tokenizer : special token handling (#3538)staviq
2023-10-12examples: support LLaVA v1.5 (multimodal model) (#3436)M. Yusuf Sarıgöz
2023-10-11common : fix mirostat state when using multiple sequences (#3543)Kerfuffle
2023-10-06parallel : add option to load external prompt file (#3416)pudepiedj
2023-10-02infill : add new example + extend server API (#3296)vvhg1