summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-04server : fix options in README.md (#4765)Michael Coppola
* fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04ggml : include stdlib.h before intrin.h (#4736)Georgi Gerganov
2024-01-04llama.swiftui : fix build of ggml.metallib (#4754)singularity
* metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03train : fix typo in overlapping-samples help msg (#4758)Daniel Bevenius
This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-03swift : update Package.swift to use ggml as dependency (#4691)Ashraful Islam
* updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov
2024-01-03cuda : simplify expressionGeorgi Gerganov
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03cuda : mark I16 and I32 ops as unsupportedGeorgi Gerganov
ggml-ci
2024-01-03sync : ggmlGeorgi Gerganov
ggml-ci
2024-01-03metal : add kernel_get_rows_i32Georgi Gerganov
ggml-ci
2024-01-03scripts : fix sync order + metal sedGeorgi Gerganov
2024-01-03ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)Guillaume Wenzek
* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03server : throw an error when `slot unavailable` (#4741)Justin Parker
2024-01-02metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725)Georgi Gerganov
* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id
2024-01-02server : add token counts to html footer (#4738)Phil H
* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <ph@got-root.co.uk>
2024-01-02llama : llama_model_desc print number of expertsGeorgi Gerganov
2024-01-02llama : replace all API facing `int`'s with `int32_t` (#4577)Marcus Dunn
* replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`
2024-01-02llama : differentiate the KV dims in the attention (#4657)postmasters
* Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02editorconfig : fix whitespace and indentation #4710Georgi Gerganov
2024-01-02server : add --override-kv parameter (#4710)minarchist
* Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <john@jLap.lan> Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
2024-01-02py : re-enable mmap in convert hf (#4732)Nam D. Tran
* update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci * fix: update torch version --------- Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io> Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02finetune: fix typo in README.md (#4733)Daniel Bevenius
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-02metal : enable shader debugging (cmake option) (#4705)Georgi Gerganov
* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci
2023-12-31flake.lock: updateSomeone Serge
to a commit recently cached by nixpkgs-cuda-ci
2023-12-31flake.nix: suggest the binary cachesSomeone Serge
2023-12-31workflows: nix-ci: add a qemu job for jetsonsSomeone Serge
2023-12-31workflows: nix-flakestry: drop tag filtersSomeone Serge
...and add a job for flakehub.com
2023-12-31workflows: weekly `nix flake update`Someone Serge
2023-12-31workflows: nix-ci: add a job for evalSomeone Serge
2023-12-31workflows: nix-ci: init; build flake outputsSomeone Serge
2023-12-31flake.nix: expose checksSomeone Serge
2023-12-31flake.nix: rocm not yet supported on aarch64, so hide the outputSomeone Serge
2023-12-31flake.nix: expose full scope in legacyPackagesSomeone Serge
2023-12-31ggml : add ggml_vdotq_s32 alias (#4715)Georgi Gerganov
ggml-ci
2023-12-30clip : refactor + bug fixes (#4696)Georgi Gerganov
* clip : refactor + bug fixes ggml-ci * server : add log message
2023-12-30CUDA: fixed tensor cores not being used on RDNA3 (#4697)Johannes Gäßler
2023-12-30ggml : add ggml_cpu_has_avx_vnni() (#4589)automaticcat
* feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29CUDA: fix tensor core logic for Pascal and HIP (#4682)Johannes Gäßler
2023-12-29clip : use ggml_backend_buffer_is_host (#4205)Georgi Gerganov
2023-12-29clip : enable gpu backend (#4205)Steward Garcia
* clip: enable CUDA backend * add missing kernels * add enough padding for alignment * remove ggml_repeat of clip.cpp * add metal backend * llava : fixes - avoid ggml_repeat - use GGML_USE_ instead of CLIP_USE_ macros - remove unused vars --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687)hydai
Signed-off-by: hydai <hydai@secondstate.io>
2023-12-29python : add check-requirements.sh and GitHub workflow (#4585)crasm
* python: add check-requirements.sh and GitHub workflow This script and workflow forces package versions to remain compatible across all convert*.py scripts, while allowing secondary convert scripts to import dependencies not wanted in convert.py. * Move requirements into ./requirements * Fail on "==" being used for package requirements (but can be suppressed) * Enforce "compatible release" syntax instead of == * Update workflow * Add upper version bound for transformers and protobuf * improve check-requirements.sh * small syntax change * don't remove venvs if nocleanup is passed * See if this fixes docker workflow * Move check-requirements.sh into ./scripts/ --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-12-29flake.nix : rewrite (#4605)Philip Taron
* flake.lock: update to hotfix CUDA::cuda_driver Required to support https://github.com/ggerganov/llama.cpp/pull/4606 * flake.nix: rewrite 1. Split into separate files per output. 2. Added overlays, so that this flake can be integrated into others. The names in the overlay are `llama-cpp`, `llama-cpp-opencl`, `llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs). 3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/) rather than `with pkgs;` so that there's dependency injection rather than dependency lookup. 4. Add a description and meta information for each package. The description includes a bit about what's trying to accelerate each one. 5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge. 6. Format with `serokell/nixfmt` for a consistent style. 7. Update `flake.lock` with the latest goods. * flake.nix: use finalPackage instead of passing it manually * nix: unclutter darwin support * nix: pass most darwin frameworks unconditionally ...for simplicity * *.nix: nixfmt nix shell github:piegamesde/nixfmt/rfc101-style --command \ nixfmt flake.nix .devops/nix/*.nix * flake.nix: add maintainers * nix: move meta down to follow Nixpkgs style more closely * nix: add missing meta attributes nix: clarify the interpretation of meta.maintainers nix: clarify the meaning of "broken" and "badPlatforms" nix: passthru: expose the use* flags for inspection E.g.: ``` ❯ nix eval .#cuda.useCuda true ``` * flake.nix: avoid re-evaluating nixpkgs too many times * flake.nix: use flake-parts * nix: migrate to pname+version * flake.nix: overlay: expose both the namespace and the default attribute * ci: add the (Nix) flakestry workflow * nix: cmakeFlags: explicit OFF bools * nix: cuda: reduce runtime closure * nix: fewer rebuilds * nix: respect config.cudaCapabilities * nix: add the impure driver's location to the DT_RUNPATHs * nix: clean sources more thoroughly ...this way outPaths change less frequently, and so there are fewer rebuilds * nix: explicit mpi support * nix: explicit jetson support * flake.nix: darwin: only expose the default --------- Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
2023-12-29cmake : fix ld warning duplicate libraries libllama.a (#4671)Cuong Trinh Manh
* fix "ld: warning: ignoring duplicate libraries: '../libllama.a'" * fix warning in example.
2023-12-29llava-cli : refactor to use sampling library (#4669)Justine Tunney
This change makes it possible to use flags like `--grammar` when using the `llava-cli` program. The rest is just code cleanup deleting a long standing TODO comment. This change also ensures that logging information is emitted to stderr which helps the `llava-cli` command be more friendly to shell scripts. See Mozilla-Ocho/llamafile@1cd334f
2023-12-29server : replace sleep with condition variables (#4673)Justine Tunney
The server currently schedules tasks using a sleep(5ms) busy loop. This adds unnecessary latency since most sleep implementations do a round up to the system scheduling quantum (usually 10ms). Other libc sleep impls spin for smaller time intervals which results in the server's busy loop consuming all available cpu. Having the explicit notify() / wait() code also helps aid in the readability of the server code. See mozilla-Ocho/llamafile@711344b
2023-12-29server : fix OpenAI server sampling w.r.t. penalty. (#4675)SakuraUmi
2023-12-29server : allow to generate multimodal embeddings (#4681)Karthik Sethuraman
2023-12-29main-cmake-pkg : fix build issue (#4665)andrijdavid
* Fix main-cmake-pkg compilation * Use glob to load common files * cmake : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674)Peter Sugihara
* fix infinite loop * slight UI simplification, clearer UX * clearer UI text, add timings to completion log
2023-12-29scripts : print list of sync commitsGeorgi Gerganov