summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2023-10-28llama : add option for greedy sampling with probs (#3813)Georgi Gerganov
2023-10-28common : print that one line of the syntax help *also* to standard output (#3...Henk Poley
2023-10-28starcoder : add GPU offloading (#3827)Georgi Gerganov
2023-10-28speculative : ensure draft and target model vocab matches (#3812)Kerfuffle
2023-10-27llama : correctly report GGUFv3 format (#3818)cebtenzzre
2023-10-27simple : fix batch handling (#3803)Thibault Terrasson
2023-10-27cuda : improve text-generation and batched decoding performance (#3776)Georgi Gerganov
2023-10-26server : do not release slot on image input (#3798)Georgi Gerganov
2023-10-25batched-bench : print params at startGeorgi Gerganov
2023-10-25log : disable pid in log filenamesGeorgi Gerganov
2023-10-24server : add parameter -tb N, --threads-batch N (#3584) (#3768)cebtenzzre
2023-10-24server : do not block system prompt update (#3767)Georgi Gerganov
2023-10-24sync : ggml (conv ops + cuda MSVC fixes) (#3765)Georgi Gerganov
2023-10-24cmake : add missed dependencies (#3763)John Smith
2023-10-24cuda : add batched cuBLAS GEMM for faster attention (#3749)Georgi Gerganov
2023-10-24Add more tokenizer tests (#3742)Galunid
2023-10-24metal : handle ggml_scale for n%4 != 0 (close #3754)Georgi Gerganov
2023-10-23Revert "make : add optional CUDA_NATIVE_ARCH (#2482)"Georgi Gerganov
2023-10-23issues : separate bug and enhancement template + no default title (#3748)M. Yusuf Sarıgöz
2023-10-23Update special token handling in conversion scripts for gpt2 derived tokenize...Galunid
2023-10-23llama : remove token functions with `context` args in favor of `model` (#3720)Marcus Dunn
2023-10-23Fix baichuan convert script not detecing model (#3739)Galunid
2023-10-22make : add optional CUDA_NATIVE_ARCH (#2482)Alex
2023-10-22server : parallel decoding and multimodal (#3677)Georgi Gerganov
2023-10-22Add test for MPT tokenization (#3728)goerch
2023-10-22readme : remove unsupported node.js library (#3703)Ian Scrivener
2023-10-22llama : validate special token ids are in range when loading GGUF model (#3635)Kerfuffle
2023-10-22main : escape prompt for cfg_negative_prompt and consecutive inputs in main w...vvhg1
2023-10-22batched : add len CLI argumentGeorgi Gerganov
2023-10-20CLBlast: Add outer loops over src0 for broadcasting in mulmatshibe2
2023-10-20sampling : refactor init to use llama_sampling_params (#3696)Georgi Gerganov
2023-10-20gguf : support big endian platform (#3552)Qin Yue Chen
2023-10-20server : fix uninitialized sampling context (close #3685)Georgi Gerganov
2023-10-20ggml : fix rope + llama minor optimizations (#3560)Herman Semenov
2023-10-20convert : restore compat with old Falcon models (#3680)cebtenzzre
2023-10-19multimodal : add BakLLaVA conversion support (#3682)M. Yusuf Sarıgöz
2023-10-19llava : avoid segfault in case of non-existent mmproj file (#3674)M. Yusuf Sarıgöz
2023-10-18readme : update hot topicsGeorgi Gerganov
2023-10-18speculative : bug fixesGeorgi Gerganov
2023-10-18speculative : add tree-based sampling example (#3624)Georgi Gerganov
2023-10-18metal : implement q5_0 and q5_1 kernels (#3648)Jhen-Jie Hong
2023-10-18opencl : fix element-wise multiplication (#3656)shibe2
2023-10-17fix embeddings when using CUDA (#3657)slaren
2023-10-17llama : avoid fprintf in favor of LLAMA_LOG (#3538)Georgi Gerganov
2023-10-17readme : update hot-topics & models, detail windows release in usage (#3615)BarfingLemurs
2023-10-17CLBlast: Fix temporary buffer size for f16 conversion (wsize)shibe2
2023-10-17train-text-from-scratch : fix assert failure in ggml-alloc (#3618)slaren
2023-10-17editorconfig : remove trailing spacesGeorgi Gerganov
2023-10-17server : documentation of JSON return value of /completion endpoint (#3632)coezbek
2023-10-17save-load-state : fix example + add ci test (#3655)Georgi Gerganov