summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-04-28flake.lock: Updategithub-actions[bot]
2024-04-27Replace "alternative" boolean operator in conditional compilation directive (...mgroeber9110
2024-04-27ci: server: tests python env on github container ubuntu latest / fix n_predic...Pierrick Hymbert
2024-04-26Reset schedule earlier to allow overlap with ggml graph computation on device...agray3
2024-04-26quantize: add imatrix and dataset metadata in GGUF (#6658)Pierrick Hymbert
2024-04-26add basic tensor data validation function (#6884)slaren
2024-04-26gguf : fix mismatch between alloc and free functions (#6929)slaren
2024-04-26llamafile : use 64-bit integers in sgemm (#6928)Justine Tunney
2024-04-26ci: server: fix python installation (#6925)Pierrick Hymbert
2024-04-26server: stop generation at `n_ctx_train` if `n_predict` is not set (#6638)Pierrick Hymbert
2024-04-26ci: server: fix python installation (#6922)Pierrick Hymbert
2024-04-26Merge pull request from GHSA-p5mv-gjc5-mwqvGeorgi Gerganov
2024-04-26ci: server: fix python installation (#6918)Pierrick Hymbert
2024-04-26ci: fix concurrency for pull_request_target (#6917)Pierrick Hymbert
2024-04-26bench: server add stop word for PHI-2 (#6916)Pierrick Hymbert
2024-04-25llava : add support for moondream vision language model (#6899)vik
2024-04-25cmake : restore LLAMA_LLAMAFILE_DEFAULTGeorgi Gerganov
2024-04-25cmake : remove obsolete ANDROID checkGeorgi Gerganov
2024-04-25llama : synchronize before get/set session data (#6911)slaren
2024-04-25ci : tmp disable slow testsGeorgi Gerganov
2024-04-25readme : update model list (#6908)BarfingLemurs
2024-04-25llama : check that all the tensor data is in the model file (#6885)slaren
2024-04-25ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906)Georgi Gerganov
2024-04-25clip : rename lerp function to avoid conflict (#6894)Daniel Bevenius
2024-04-25ggml : fix MIN / MAX macros (#6904)Georgi Gerganov
2024-04-25tests : minor bash stuff (#6902)Georgi Gerganov
2024-04-25quantize : add '--keep-split' to quantize model into shards (#6688)jiez
2024-04-24README: add graphic for matrix multiplication (#6881)Johannes Gäßler
2024-04-24llama : add llama_get_pooling_type function (#6862)Douglas Hanley
2024-04-24server : do not apply Markdown formatting in code sections (#6850)mgroeber9110
2024-04-24common : revert showing control tokens by default for server (#6860)Kyle Mistele
2024-04-24Server: fix seed for multiple slots (#6835)Johannes Gäßler
2024-04-24ggml : move 32-bit arm compat in ggml-impl.h (#6865)Georgi Gerganov
2024-04-24llama : add phi 3 chat template (#6857)Tristan Druyen
2024-04-24convert : add support of codeqwen due to tokenizer (#6707)Junyang Lin
2024-04-24llama : add phi3 support (#6852)liuwei-git
2024-04-23[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 flag activ...Anas Ahouzi
2024-04-22llamafile : improve sgemm.cpp (#6796)Justine Tunney
2024-04-22ggml : fix calloc argument ordering. (#6820)Dave Airlie
2024-04-22llama : fix typo in <|im_end|> token text (#6745)Georgi Gerganov
2024-04-22ci: fix job are cancelling each other (#6781)Pierrick Hymbert
2024-04-22flake.lock: Updategithub-actions[bot]
2024-04-21`build`: generate hex dump of server assets during build (#6661)Olivier Chafik
2024-04-21llama : add option to render special/control tokens (#6807)Georgi Gerganov
2024-04-21ggml : fix ggml_backend_cpu_supports_op() for CPY (#0)Georgi Gerganov
2024-04-21llama : add llama-3 chat template (#6751)Wouter
2024-04-21gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761)pmysl
2024-04-21doc : add link to falcon (#6789)Jan Boon
2024-04-21readme : add Fedora instructions (#6783)Mohammadreza Hendiani
2024-04-21llava : use logger in llava-cli (#6797)Justine Tunney