summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-03-29Vulkan k-quant mmq and ggml-backend offload functionality (#6155)0cc4m
2024-03-29sync : ggml (#6351)Georgi Gerganov
2024-03-29[Model] Add support for xverse (#6301)hxer7963
2024-03-29ci : fix BGE wget (#6383)Georgi Gerganov
2024-03-29readme : add project (#6356)zhouwg
2024-03-29cmake : add explicit metal version options (#6370)Matt Clayton
2024-03-29llama : remove redundant reshape in build_kv_store (#6369)Daniel Bevenius
2024-03-29convert : allow conversion of Mistral HF models (#6144)Pedro Cuenca
2024-03-28readme : add notice for UI listGeorgi Gerganov
2024-03-28[SYCL] Revisited & updated SYCL build documentation (#6141)Ouadie EL FAROUKI
2024-03-28convert : refactor vocab selection logic (#6355)Jared Van Bortel
2024-03-28llava : fix MobileVLM (#6364)Ziang Wu
2024-03-28llama : fix command-r inference when omitting outputs (#6367)compilade
2024-03-28ci: bench: fix master not schedule, fix commit status failed on external repo...Pierrick Hymbert
2024-03-28doc: fix outdated default value of batch size (#6336)Ting Sun
2024-03-28server : stop gracefully on SIGTERM (#6348)Eric Zhang
2024-03-28nix: removed unnessesary indentationhutli
2024-03-28nix: moved blas availability check to package inputs so it is still overridablehutli
2024-03-28using blas.meta.available to check host platformhutli
2024-03-28only using explicit blas if hostPlatform is allowedhutli
2024-03-28nix: .#windows: proper cross-compilation set-upSomeone Serge
2024-03-28nix: package: don't introduce the dependency on pythonSomeone Serge
2024-03-28nix: .#widnows: inithutli
2024-03-28doc: fix typo in MobileVLM-README.md (#6181)Ziang Wu
2024-03-28[SYCL] fix set main gpu crash (#6339)Neo Zhang Jianyu
2024-03-27server: continuous performance monitoring and PR comment (#6283)Pierrick Hymbert
2024-03-27nix: ci: dont test cuda and rocm (for now)Someone Serge
2024-03-27ggml : fix bounds checking of zero size views (#6347)slaren
2024-03-27make : whitespaceGeorgi Gerganov
2024-03-27embedding : show full embedding for single prompt (#6342)howlger
2024-03-27[SYCL] Fix batched impl for NVidia GPU (#6164)AidanBeltonS
2024-03-27Make IQ1_M work for QK_K = 64 (#6327)Kawrakow
2024-03-27common : change --no-penalize-nl to --penalize-nl (#6334)Sigbjørn Skjæret
2024-03-27llama2c : open file as binary (#6332)Georgi Gerganov
2024-03-27readme : add php api bindings (#6326)Mateusz Charytoniuk
2024-03-27server: public: use relative routes for static files (#6325)Eric Zhang
2024-03-27[SYCL] fix no file in win rel (#6314)Neo Zhang Jianyu
2024-03-26wpm : portable unicode tolower (#6305)Jared Van Bortel
2024-03-26llama : greatly reduce output buffer memory usage (#6122)compilade
2024-03-26IQ1_M: 1.75 bpw quantization (#6302)Kawrakow
2024-03-26convert-hf : fix exception in sentencepiece with added tokens (#6320)Pedro Cuenca
2024-03-26quantize : be able to override metadata by key (#6321)Kawrakow
2024-03-26embedding : adjust `n_ubatch` value (#6296)Minsoo Cheong
2024-03-26server : add `n_discard` parameter (#6300)Jan Boon
2024-03-25nix: make `xcrun` visible in Nix sandbox for precompiling Metal shaders (#6118)Joseph Stahl
2024-03-26cuda : rename build flag to LLAMA_CUDA (#6299)slaren
2024-03-25nix: fix blas support (#6281)Christian Kögler
2024-03-25tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)Kawrakow
2024-03-25flake.lock: Update (#6266)Georgi Gerganov
2024-03-25cuda : fix LLAMA_CUDA_F16 build (#6298)slaren