summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2023-11-13ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)Georgi Gerganov
2023-11-13readme : update hot topicsGeorgi Gerganov
2023-11-13sync : ggml (backend v2) (#3912)Georgi Gerganov
2023-11-13Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)Kerfuffle
2023-11-12gguf-py: gguf_writer: Use bytearray to build metadata (#4051)Kerfuffle
2023-11-11Fix some documentation typos/grammar mistakes (#4032)Richard Kiss
2023-11-11Fix gguf-convert-endian script (#4037)M. Yusuf Sarıgöz
2023-11-10server : fix crash when prompt exceeds context size (#3996)Alexey Parfenov
2023-11-11gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981)Kerfuffle
2023-11-10server : allow continue edit on completion mode (#3950)Jhen-Jie Hong
2023-11-10Unbreak persimmon after #3837 (#4010)Galunid
2023-11-09scripts: Generalize convert scripts (#3838)Galunid
2023-11-08server : add min_p param (#3877)Mihai
2023-11-08ggml-alloc : fix backend assignments of views (#3982)slaren
2023-11-07gguf : track writer state, free unneeded tensors, cleanup (#3871)Jared Van Bortel
2023-11-07make : do not add linker flags when compiling static llava lib (#3977)Georgi Gerganov
2023-11-07ggml : fix backward rope after YaRN (#3974)xaedes
2023-11-07Use params when loading models in llava-cli (#3976)Matthew Tejo
2023-11-07cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)Meng Zhang
2023-11-07llava : expose as a shared library for downstream projects (#3613)Damian Stewart
2023-11-05ggml-cuda : fix f16 mul mat (#3961)slaren
2023-11-05Allow common process_escapes to handle \x sequences (#3928)Kerfuffle
2023-11-05server : fix typo for --alias shortcut from -m to -a (#3958)Thái Hoàng Tâm
2023-11-05cuda : fix disabling device with --tensor-split 1,0 (#3951)Jared Van Bortel
2023-11-05llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)Meng Zhang
2023-11-05cmake : MSVC instruction detection (fixed up #809) (#3923)Eve
2023-11-05ci : use intel sde when ci cpu doesn't support avx512 (#3949)Eve
2023-11-05cuda : revert CUDA pool stuff (#3944)slaren
2023-11-04gguf-py: Support 01.AI Yi models (#3943)Kerfuffle
2023-11-03metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938)Peter Sugihara
2023-11-03ggml-metal: fix yarn rope (#3937)Xiao-Yong Jin
2023-11-03ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)slaren
2023-11-03speculative : change default p_accept to 0.5 + CLI args (#3919)Georgi Gerganov
2023-11-03common : YAYF (yet another YARN fix) (#3925)Georgi Gerganov
2023-11-03llama : change yarn_ext_factor placeholder to -1 (#3922)cebtenzzre
2023-11-02cuda : add ROCM aliases for CUDA pool stuff (#3918)Kerfuffle
2023-11-02cmake : fix relative path to git submodule index (#3915)Andrei
2023-11-02readme : add notice about #3912Georgi Gerganov
2023-11-02cuda : fix const ptrs warning causing ROCm build issues (#3913)Georgi Gerganov
2023-11-02cuda : use CUDA memory pool with async memory allocation/deallocation when av...Oleksii Maryshchenko
2023-11-02gguf : print error for GGUFv1 files (#3908)Georgi Gerganov
2023-11-02cmake : disable LLAMA_NATIVE by default (#3906)slaren
2023-11-02gguf : remove special-case code for GGUFv1 (#3901)Georgi Gerganov
2023-11-02llm : prevent from 1-D tensors being GPU split (#3697)Georgi Gerganov
2023-11-02build : link against build info instead of compiling against it (#3879)cebtenzzre
2023-11-02cuda : check if this fixes Pascal card regression (#3882)Georgi Gerganov
2023-11-02metal : fix build errors and kernel sig after #2268 (#3898)Georgi Gerganov
2023-11-02cuda : fix RoPE after #2268 (#3897)cebtenzzre
2023-11-01llama : fix llama_context_default_params after #2268 (#3893)cebtenzzre
2023-11-01ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)slaren