summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-05-03llama : rename ctx to user_data in progress_callback (#7045)Daniel Bevenius
2024-05-03Remove .attention from skipped tensors to match more accurately (#7051)Bartowski
2024-05-02chore: fix typo in llama.cpp (#7032)alwqx
2024-05-01Update LOG_IMPL and LOG_TEE_IMPL (#7029)Andrew Downing
2024-05-01main : fix off by one error for context shift (#6921)l3utterfly
2024-05-01Server: add tests for batch size, different seeds (#6950)Johannes Gäßler
2024-05-01CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)Johannes Gäßler
2024-05-01ci : exempt confirmed bugs from being tagged as stale (#7014)slaren
2024-04-30perplexity: more statistics, added documentation (#6936)Johannes Gäßler
2024-04-30switch to using localizedDescription (#7010)Kevin Gibbons
2024-04-30metal : remove deprecated error code (#7008)Georgi Gerganov
2024-04-30metal : log more info on error (#6987)Kevin Gibbons
2024-04-30ggml : add Flash Attention (#5021)Georgi Gerganov
2024-04-30convert : use utf8 encoding (#7000)Georgi Gerganov
2024-04-30Improve usability of --model-url & related flags (#6930)Olivier Chafik
2024-04-29Extending grammar integration tests (#6644)Clint Herron
2024-04-29main : fix typo in comment in main.cpp (#6985)Daniel Bevenius
2024-04-29build(cmake): simplify instructions (`cmake -B build && cmake --build build ....Olivier Chafik
2024-04-29ci : tmp disable gguf-split (#6983)Georgi Gerganov
2024-04-29ggml : fix __MSC_VER -> _MSC_VER (#6977)Georgi Gerganov
2024-04-29llava-cli : multiple images (#6969)cpumaxx
2024-04-29readme : update hot topicsGeorgi Gerganov
2024-04-29llama : fix BPE pre-tokenization (#6920)Georgi Gerganov
2024-04-29sampling : use std::random_device{}() for default random seed (#6962)David Renshaw
2024-04-29convert : fix conversion of some BERT embedding models (#6937)Christian Zhou-Zheng
2024-04-29make : change GNU make default CXX from g++ to c++ (#6966)Przemysław Pawełczyk
2024-04-29ci : add building in MSYS2 environments (Windows) (#6967)Przemysław Pawełczyk
2024-04-29llama : fix typo LAMMAFILE -> LLAMAFILE (#6974)Johannes Gäßler
2024-04-29Fix more int overflow during quant (PPL/CUDA). (#6563)DAN™
2024-04-28gguf : enforce that tensor names are unique (#6905)Xuan Son Nguyen
2024-04-28add device version in device list (#6959)Neo Zhang
2024-04-28flake.lock: Updategithub-actions[bot]
2024-04-27Replace "alternative" boolean operator in conditional compilation directive (...mgroeber9110
2024-04-27ci: server: tests python env on github container ubuntu latest / fix n_predic...Pierrick Hymbert
2024-04-26Reset schedule earlier to allow overlap with ggml graph computation on device...agray3
2024-04-26quantize: add imatrix and dataset metadata in GGUF (#6658)Pierrick Hymbert
2024-04-26add basic tensor data validation function (#6884)slaren
2024-04-26gguf : fix mismatch between alloc and free functions (#6929)slaren
2024-04-26llamafile : use 64-bit integers in sgemm (#6928)Justine Tunney
2024-04-26ci: server: fix python installation (#6925)Pierrick Hymbert
2024-04-26server: stop generation at `n_ctx_train` if `n_predict` is not set (#6638)Pierrick Hymbert
2024-04-26ci: server: fix python installation (#6922)Pierrick Hymbert
2024-04-26Merge pull request from GHSA-p5mv-gjc5-mwqvGeorgi Gerganov
2024-04-26ci: server: fix python installation (#6918)Pierrick Hymbert
2024-04-26ci: fix concurrency for pull_request_target (#6917)Pierrick Hymbert
2024-04-26bench: server add stop word for PHI-2 (#6916)Pierrick Hymbert
2024-04-25llava : add support for moondream vision language model (#6899)vik
2024-04-25cmake : restore LLAMA_LLAMAFILE_DEFAULTGeorgi Gerganov
2024-04-25cmake : remove obsolete ANDROID checkGeorgi Gerganov
2024-04-25llama : synchronize before get/set session data (#6911)slaren