summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-05ggml : do not sched_yield when calling BLAS (#4761)Georgi Gerganov
* ggml : do not sched_yield when calling BLAS ggml-ci * ggml : fix do_yield logic ggml-ci * ggml : simplify do_yield logic ggml-ci
2024-01-05examples : add few-shot translation example (#4783)Georgi Gerganov
2024-01-04finetune : remove unused includes (#4756)Daniel Bevenius
This commit removes unused includes from finetune.cpp. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-04server : send token probs for "stream == false" (#4714)Georgi Gerganov
2024-01-04Print backend name on test-backend-ops failure (#4751)Johannes Gäßler
2024-01-04llama.swiftui : support loading custom model from file picker (#4767)singularity
* swiftui: support load model from file picker * swiftui: remove trailing whitespace
2024-01-04server : fix options in README.md (#4765)Michael Coppola
* fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04ggml : include stdlib.h before intrin.h (#4736)Georgi Gerganov
2024-01-04llama.swiftui : fix build of ggml.metallib (#4754)singularity
* metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03train : fix typo in overlapping-samples help msg (#4758)Daniel Bevenius
This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-03swift : update Package.swift to use ggml as dependency (#4691)Ashraful Islam
* updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov
2024-01-03cuda : simplify expressionGeorgi Gerganov
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03cuda : mark I16 and I32 ops as unsupportedGeorgi Gerganov
ggml-ci
2024-01-03sync : ggmlGeorgi Gerganov
ggml-ci
2024-01-03metal : add kernel_get_rows_i32Georgi Gerganov
ggml-ci
2024-01-03scripts : fix sync order + metal sedGeorgi Gerganov
2024-01-03ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)Guillaume Wenzek
* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03server : throw an error when `slot unavailable` (#4741)Justin Parker
2024-01-02metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725)Georgi Gerganov
* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id
2024-01-02server : add token counts to html footer (#4738)Phil H
* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <ph@got-root.co.uk>
2024-01-02llama : llama_model_desc print number of expertsGeorgi Gerganov
2024-01-02llama : replace all API facing `int`'s with `int32_t` (#4577)Marcus Dunn
* replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`
2024-01-02llama : differentiate the KV dims in the attention (#4657)postmasters
* Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02editorconfig : fix whitespace and indentation #4710Georgi Gerganov
2024-01-02server : add --override-kv parameter (#4710)minarchist
* Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <john@jLap.lan> Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
2024-01-02py : re-enable mmap in convert hf (#4732)Nam D. Tran
* update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci * fix: update torch version --------- Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io> Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02finetune: fix typo in README.md (#4733)Daniel Bevenius
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-02metal : enable shader debugging (cmake option) (#4705)Georgi Gerganov
* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci
2023-12-31flake.lock: updateSomeone Serge
to a commit recently cached by nixpkgs-cuda-ci
2023-12-31flake.nix: suggest the binary cachesSomeone Serge
2023-12-31workflows: nix-ci: add a qemu job for jetsonsSomeone Serge
2023-12-31workflows: nix-flakestry: drop tag filtersSomeone Serge
...and add a job for flakehub.com
2023-12-31workflows: weekly `nix flake update`Someone Serge
2023-12-31workflows: nix-ci: add a job for evalSomeone Serge
2023-12-31workflows: nix-ci: init; build flake outputsSomeone Serge
2023-12-31flake.nix: expose checksSomeone Serge
2023-12-31flake.nix: rocm not yet supported on aarch64, so hide the outputSomeone Serge
2023-12-31flake.nix: expose full scope in legacyPackagesSomeone Serge
2023-12-31ggml : add ggml_vdotq_s32 alias (#4715)Georgi Gerganov
ggml-ci
2023-12-30clip : refactor + bug fixes (#4696)Georgi Gerganov
* clip : refactor + bug fixes ggml-ci * server : add log message
2023-12-30CUDA: fixed tensor cores not being used on RDNA3 (#4697)Johannes Gäßler
2023-12-30ggml : add ggml_cpu_has_avx_vnni() (#4589)automaticcat
* feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29CUDA: fix tensor core logic for Pascal and HIP (#4682)Johannes Gäßler
2023-12-29clip : use ggml_backend_buffer_is_host (#4205)Georgi Gerganov
2023-12-29clip : enable gpu backend (#4205)Steward Garcia
* clip: enable CUDA backend * add missing kernels * add enough padding for alignment * remove ggml_repeat of clip.cpp * add metal backend * llava : fixes - avoid ggml_repeat - use GGML_USE_ instead of CLIP_USE_ macros - remove unused vars --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687)hydai
Signed-off-by: hydai <hydai@secondstate.io>
2023-12-29python : add check-requirements.sh and GitHub workflow (#4585)crasm
* python: add check-requirements.sh and GitHub workflow This script and workflow forces package versions to remain compatible across all convert*.py scripts, while allowing secondary convert scripts to import dependencies not wanted in convert.py. * Move requirements into ./requirements * Fail on "==" being used for package requirements (but can be suppressed) * Enforce "compatible release" syntax instead of == * Update workflow * Add upper version bound for transformers and protobuf * improve check-requirements.sh * small syntax change * don't remove venvs if nocleanup is passed * See if this fixes docker workflow * Move check-requirements.sh into ./scripts/ --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-12-29flake.nix : rewrite (#4605)Philip Taron
* flake.lock: update to hotfix CUDA::cuda_driver Required to support https://github.com/ggerganov/llama.cpp/pull/4606 * flake.nix: rewrite 1. Split into separate files per output. 2. Added overlays, so that this flake can be integrated into others. The names in the overlay are `llama-cpp`, `llama-cpp-opencl`, `llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs). 3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/) rather than `with pkgs;` so that there's dependency injection rather than dependency lookup. 4. Add a description and meta information for each package. The description includes a bit about what's trying to accelerate each one. 5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge. 6. Format with `serokell/nixfmt` for a consistent style. 7. Update `flake.lock` with the latest goods. * flake.nix: use finalPackage instead of passing it manually * nix: unclutter darwin support * nix: pass most darwin frameworks unconditionally ...for simplicity * *.nix: nixfmt nix shell github:piegamesde/nixfmt/rfc101-style --command \ nixfmt flake.nix .devops/nix/*.nix * flake.nix: add maintainers * nix: move meta down to follow Nixpkgs style more closely * nix: add missing meta attributes nix: clarify the interpretation of meta.maintainers nix: clarify the meaning of "broken" and "badPlatforms" nix: passthru: expose the use* flags for inspection E.g.: ``` ❯ nix eval .#cuda.useCuda true ``` * flake.nix: avoid re-evaluating nixpkgs too many times * flake.nix: use flake-parts * nix: migrate to pname+version * flake.nix: overlay: expose both the namespace and the default attribute * ci: add the (Nix) flakestry workflow * nix: cmakeFlags: explicit OFF bools * nix: cuda: reduce runtime closure * nix: fewer rebuilds * nix: respect config.cudaCapabilities * nix: add the impure driver's location to the DT_RUNPATHs * nix: clean sources more thoroughly ...this way outPaths change less frequently, and so there are fewer rebuilds * nix: explicit mpi support * nix: explicit jetson support * flake.nix: darwin: only expose the default --------- Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
2023-12-29cmake : fix ld warning duplicate libraries libllama.a (#4671)Cuong Trinh Manh
* fix "ld: warning: ignoring duplicate libraries: '../libllama.a'" * fix warning in example.
2023-12-29llava-cli : refactor to use sampling library (#4669)Justine Tunney
This change makes it possible to use flags like `--grammar` when using the `llava-cli` program. The rest is just code cleanup deleting a long standing TODO comment. This change also ensures that logging information is emitted to stderr which helps the `llava-cli` command be more friendly to shell scripts. See Mozilla-Ocho/llamafile@1cd334f