summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-03-09server : clarify some items in the readme (#5957)Georgi Gerganov
2024-03-09server : normalize embeddings (#5956)SeungWon Jeong
2024-03-09tests : gitignore ggml-common.hGeorgi Gerganov
2024-03-09server : fix passing prompt as tokens (#5955)Alexey Parfenov
2024-03-09ggml : add ggml-common.h to deduplicate shared code (#5940)Georgi Gerganov
2024-03-09server : simplify logic for empty prompts (#5953)Georgi Gerganov
2024-03-09Server: reorganize some http logic (#5939)Xuan Son Nguyen
2024-03-09server : add SSL support (#5926)Gabe Goodhart
2024-03-09server: tests: add truncated prompt tests, better kv cache size (#5933)Pierrick Hymbert
2024-03-08llama : support Mamba Selective State Space Models (#5328)compilade
2024-03-08llama : fix quantization of shared token_embd (#5944)compilade
2024-03-08server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predic...Pierrick Hymbert
2024-03-08llama : assume tied weights if lm_head/output weights is missing (#5824)Don Mahurin
2024-03-08server : fix EOS token detection with disabled cache (#5938)Georgi Gerganov
2024-03-08log : fix MSVC compile errors (#5643)UEXTM.com
2024-03-07llama-bench : add embeddings option (#5924)Georgi Gerganov
2024-03-07Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918)Neo Zhang Jianyu
2024-03-07server : add `/v1/completions` endpoint (#5914)Minsoo Cheong
2024-03-07server : refactor (#5882)Georgi Gerganov
2024-03-07[SYCL] fix error when set main gpu to non-zero (#5901)Neo Zhang Jianyu
2024-03-06ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906)Jared Van Bortel
2024-03-06ggml : use `uint8x16_t` return type for `ggml_vqtbl1q_u8` (#5894)bobqianic
2024-03-06convert : remove AWQ remnants (#5768)Georgi Gerganov
2024-03-06add wait() to make code stable (#5895)Neo Zhang Jianyu
2024-03-05compare-llama-bench.py : remove mul_mat_q (#5892)slaren
2024-03-05quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889)Jared Van Bortel
2024-03-05grammars : blacklists character control set (#5888)ExtReMLapin
2024-03-05Revert "grammars : don't allow to output unescaped new line in string (#5885)"Georgi Gerganov
2024-03-05grammars : don't allow to output unescaped new line in string (#5885)ExtReMLapin
2024-03-05Vulkan Improvements (#5835)0cc4m
2024-03-05[SYCL] fix mul_mat fault in CI/unit-test (#5862)Neo Zhang Jianyu
2024-03-05fix editorconfig check break (#5879)Minsoo Cheong
2024-03-04fix speculative decoding build on windows (#5874)Jeffrey Quesnelle
2024-03-04nix: static build (#5814)hutli
2024-03-04llama : fix embeddings (#5796)Georgi Gerganov
2024-03-04flake : fixGeorgi Gerganov
2024-03-04ggml : fix unknown status (#0)Georgi Gerganov
2024-03-04sync : ggmlGeorgi Gerganov
2024-03-04ggml : introduce ggml_status (ggml/750)Michael Podvitskiy
2024-03-04cmake : handle cases where git index is not found in .git (#5844)Dane Madsen
2024-03-04speculative : implement stochastic speculative sampling (#5625)Minsoo Cheong
2024-03-04add alias for chat template (#5858)Xuan Son Nguyen
2024-03-04sync : ggmlGeorgi Gerganov
2024-03-04add some new ops, fix some operators and add batch operations to certain oper...leejet
2024-03-04common : use LLAMA_DEFAULT_SEED (#5855)DAN™
2024-03-04main : support special tokens as reverse/anti prompt (#5847)DAN™
2024-03-03cuda : fix data race in soft max (#5853)slaren
2024-03-03readme : add API changes sectionGeorgi Gerganov
2024-03-03llama : allow for user specified embedding pooling type (#5849)Douglas Hanley
2024-03-03gguf-dump : support i-quants (#5841)Nindaleth