Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
openblas v0.3.22 64-bit pkg-config file is named openblas64.pc
https://github.com/OpenMathLib/OpenBLAS/issues/3790
|
|
betwen -> between
|
|
ggml-ci
|
|
|
|
|
|
* ggml : do not sched_yield when calling BLAS
ggml-ci
* ggml : fix do_yield logic
ggml-ci
* ggml : simplify do_yield logic
ggml-ci
|
|
|
|
This commit removes unused includes from finetune.cpp.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
|
|
|
|
|
|
* swiftui: support load model from file picker
* swiftui: remove trailing whitespace
|
|
* fix examples/server/README.md
* minor : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* metal: fix metal backend init failure in swiftui
* metal: build ggml.metallib instead of copy src
* llama.swift : remove debug flags from metallib build
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
This commit fixes a typo in the help message for the
--overlapping-samples option.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
|
|
* updates the package.swift to use ggml as dependency
* changes the ggml package url src to ggerganov
|
|
Co-authored-by: slaren <slarengh@gmail.com>
|
|
ggml-ci
|
|
ggml-ci
|
|
ggml-ci
|
|
|
|
* add more int ops
* ggml_compute_forward_dup_bytes
* add tests
* PR comments
* tests : minor indentations
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (#4545)
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
* metal : optimizing ggml_mul_mat_id (wip)
* metal : minor fix
* metal : opt mul_mm_id
|
|
* server: add token counts to stats
* server: generate hpp
---------
Co-authored-by: phiharri <ph@got-root.co.uk>
|
|
|
|
* replaced all API facing `int`'s with `int32_t`
* formatting and missed `int` in `llama_token_to_piece`
|
|
* Add n_key_dim and n_value_dim
Some models use values that are not derived from `n_embd`.
Also remove `n_embd_head` and `n_embd_gqa` because it is not clear
which "head" is referred to (key or value).
Fix issue #4648.
* Fix `llm_build_kqv` to use `n_value_gqa`
* Rebase
* Rename variables
* Fix llm_build_kqv to be more generic wrt n_embd_head_k
* Update default values for n_embd_head_k and n_embd_head_v
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix llm_load_tensors: the asserts were not backcompat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* Changes to server to allow metadata override
* documentation
* flake.nix: expose full scope in legacyPackages
* flake.nix: rocm not yet supported on aarch64, so hide the output
* flake.nix: expose checks
* workflows: nix-ci: init; build flake outputs
* workflows: nix-ci: add a job for eval
* workflows: weekly `nix flake update`
* workflows: nix-flakestry: drop tag filters
...and add a job for flakehub.com
* workflows: nix-ci: add a qemu job for jetsons
* flake.nix: suggest the binary caches
* flake.lock: update
to a commit recently cached by nixpkgs-cuda-ci
---------
Co-authored-by: John <john@jLap.lan>
Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
|
|
* update: awq support llama-7b model
* update: change order
* update: benchmark results for llama2-7b
* update: mistral 7b v1 benchmark
* update: support 4 models
* fix: Readme
* update: ready for PR
* update: readme
* fix: readme
* update: change order import
* black
* format code
* update: work for bot mpt and awqmpt
* update: readme
* Rename to llm_build_ffn_mpt_awq
* Formatted other files
* Fixed params count
* fix: remove code
* update: more detail for mpt
* fix: readme
* fix: readme
* update: change folder architecture
* fix: common.cpp
* fix: readme
* fix: remove ggml_repeat
* update: cicd
* update: cicd
* uppdate: remove use_awq arg
* update: readme
* llama : adapt plamo to new ffn
ggml-ci
* fix: update torch version
---------
Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io>
Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
|
|
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (#4545)
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
ggml-ci
|
|
to a commit recently cached by nixpkgs-cuda-ci
|
|
|
|
|
|
...and add a job for flakehub.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ggml-ci
|
|
* clip : refactor + bug fixes
ggml-ci
* server : add log message
|
|
|