summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-21build : add mac pre-build binaries (#6182)Vaibhav Srivastav
* Initial commit - add mac prebuilds. * forward contribution credits for building the workflow. * minor : remove trailing whitespaces --------- Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-21Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)Kawrakow
* k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-03-21Add nvidia and amd backends (#6157)AidanBeltonS
2024-03-21cuda : fix conflict with std::swap (#6186)slaren
2024-03-20cuda : print the returned error when CUDA initialization fails (#6185)slaren
2024-03-20llava : update MobileVLM-README.md (#6180)Ziang Wu
2024-03-20llava : add MobileVLM_V2 backup (#6175)Ziang Wu
* Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace * fix deifinition mistake in clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20cuda : refactor to remove global resources (#6170)slaren
* cuda : refactor to remove global resources
2024-03-20Server: version bump for httplib and json (#6169)Xuan Son Nguyen
* server: version bump for httplib and json * fix build * bring back content_length
2024-03-20gitignore : ignore curl-related filesGeorgi Gerganov
2024-03-20server : allow to override -ngl in tests (#6170)Georgi Gerganov
2024-03-20Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"Georgi Gerganov
This reverts commit f8c4e745e1e728204ab26dbadf52853545e6789c.
2024-03-20llava : add a MobileVLM_V2-1.7B backup (#6152)Ziang Wu
* Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20Server: Handle n_keep parameter in the request (#6174)Karthick
2024-03-20server tests : more pythonic process management; fix bare `except:` (#6146)Jared Van Bortel
* server tests : remove seemingly redundant newlines in print() * server tests : use built-in subprocess features, not os.kill and psutil * server tests : do not catch e.g. SystemExit; use print_exc * server tests: handle TimeoutExpired exception * server tests: fix connect on dual-stack systems * server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127) * server: tests: remove the hack on windows since now we get the good socket family * server: tests: add new tokens regex following new repeat penalties default changed in (#6127) * server: tests: add new tokens regex following new repeat penalties default changed in (#6127) --------- Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>
2024-03-20update readme sycl for new update (#6151)Neo Zhang Jianyu
* update readme sycl for new update * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> * Update README-sycl.md Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> * update by review comments * update w64devkit link * update for verify device id part * Update README-sycl.md Co-authored-by: Meng, Hengyu <airdldl@163.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-03-20increase igpu cluster limit (#6159)Abhilash Majumder
2024-03-19Remove undeed header file. (#6158)DAN™
2024-03-19gguf-split: split and merge gguf per batch of tensors (#6135)Pierrick Hymbert
* gguf-split: split and merge gguf files per tensor * gguf-split: build with make toolchain * gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split * split : minor style + fix compile warnings * gguf-split: remove --upload not implemented --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-19common : disable repeat penalties by default (#6127)Georgi Gerganov
2024-03-19ci : exempt some labels from being tagged as stale (#6140)slaren
2024-03-19common : print usage on '-h' and '--help' (#6145)DAN™
2024-03-18flake.lock: Updategithub-actions[bot]
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06) → 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)
2024-03-18mpt : implement backwards compatiblity with duped output tensor (#6139)Jared Van Bortel
2024-03-18clip : fix memory leak (#6138)Felix
2024-03-18backend : set max split inputs to GGML_MAX_SRC (#6137)slaren
2024-03-18ci : disable stale issue messages (#6126)Georgi Gerganov
2024-03-18ci : temporary disable sanitizer builds (#6128)Georgi Gerganov
2024-03-18backend : offload large batches to GPU (#6083)slaren
* backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-03-18common : tidy-up argument parsing (#6105)DAN™
* Tidy-up argument parsing. * Missing ref. * common : minor * common : add static classifier --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-18convert : add support for CamembertModel architecture (#6119)Thérence
Adding support for CamembertModel architecture used by : https://huggingface.co/dangvantuan/sentence-camembert-large
2024-03-18convert : use f32 outtype for bf16 tensors (#6106)Romain D
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion. Change the outtype to f32 to default to a lossless conversion.
2024-03-17common: llama_load_model_from_url using --model-url (#6098)Pierrick Hymbert
* common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-17ci : close all stale issues at once (#6115)Georgi Gerganov
2024-03-17ggml:fix finding transfer queue family index error (#6094)GainLee
Co-authored-by: GainLee <ligen@meizu.com>
2024-03-16ggml : add AVX512F SIMD (#6088)AmirAli Mirian
2024-03-16gritlm : add initial README.md (#6086)Daniel Bevenius
* gritlm: add initial README.md to examples/gritlm This commit adds a suggestion for an initial README.md for the gritlm example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Use the `scripts/hf.sh` script to download the model file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Fix editorconfig-checker error in examples/gritlm/README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-03-16readme : add wllama as a wasm binding (#6100)Xuan Son Nguyen
2024-03-16common : refactor nested if causing error C1061 on MSVC (#6101)DAN™
* Refactor nested if causing error C1061 on MSVC. * Revert back and remove else's. * Add flag to track found arguments.
2024-03-16ci : close inactive issue with workflow (#6053)Pierrick Hymbert
* issues: ci - close inactive issue with workflow * ci: close issue, change workflow schedule time
2024-03-15llama : fix Baichuan2 13B (#6092)slaren
2024-03-15llama : add support for control vectors (#5970)Theia Vogel
* control vector api and implementation * control-vectors : minor code style updates * disable control vector when data == nullptr use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-15llama : add Command-R support (#6033)Andrew Canis
Information about the Command-R 35B model (128k context) can be found at: https://huggingface.co/CohereForAI/c4ai-command-r-v01 Based on the llama2 model with a few changes: 1) New hyper parameter to scale output logits (logit_scale) 2) Uses LayerNorm instead of RMSNorm 3) Transfomer layers have a single shared LayerNorm that feeds into both the self-attention and FFN layers in parallel. There is no post-attention LayerNorm. 4) No support for Rotary Position Embeddings (RoPE) scaling 5) No biases used Find GGUF files here: https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF To convert model to GGUF format yourself: 1) Download Command-R Hugging Face safetensors: git lfs install git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01 2) Run: python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
2024-03-15llava : change API to pure C style for Rust FFI bindgen (#6079)Ting Lou
Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>
2024-03-15cuda : disable unused cudaLaunchHostFunc code (#6078)slaren
2024-03-15fix set main gpu error (#6073)Neo Zhang Jianyu
2024-03-15make : ggml-metal.o depends on ggml.hGeorgi Gerganov
2024-03-15[SYCL] Fix non-intel device selection (#6042)AidanBeltonS
* Fix non-intel device selection * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-03-15gguf : add support for I64 and F64 arrays (#6062)Ondřej Čertík
* gguf : add support for I64 and F64 arrays GGML currently does not support I64 or F64 arrays and they are not often used in machine learning, however if in the future the need arises, it would be nice to add them now, so that the types are next to the other types I8, I16, I32 in the enums, and it also reserves their type number. Furthermore, with this addition the GGUF format becomes very usable for most computational applications of NumPy (being compatible with the most common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster, and more versatile alternative to the `npz` format, and a simpler alternative to the `hdf5` format. The change in this PR seems small, not significantly increasing the maintenance burden. I tested this from Python using GGUFWriter/Reader and `gguf-dump`, as well as from C, everything seems to work. * Fix compiler warnings
2024-03-15llama : add Orion chat template (#6066)Xuan Son Nguyen