Age | Commit message (Collapse) | Author |
|
|
|
|
|
* Typo fix to server's README.md
Fix minor typo ("tonen") in server README.
* server readme grammar/style fixes.
Quickly went through this file to look for inconsistencies in
presentation of defaults, flag options, and looked for typos
and grammar issues.
Not perfect, but hopefully improved.
* Update README.md
Remove an extra space before newline.
|
|
Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net>
|
|
* initial commit for sealion support
* add sealion support
* minor fix
* q/k ln and pos_embd only if required
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* minor : clear whitespaces
---------
Co-authored-by: bryan <bryansiow@aisingapore.org>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* CI: Update actions/checkout to v4
* CI: Update actions/setup-python to v5
* CI: Update actions/upload-artifact to v4
|
|
|
|
|
|
* Create SECURITY.md
Signed-off-by: Joyce <joycebrum@google.com>
* Fix: link on SECURITY.md
Signed-off-by: Joyce <joycebrum@google.com>
* Fix: link on SECURITY.md
Signed-off-by: Joyce <joycebrum@google.com>
* minor
* fix
* fix
---------
Signed-off-by: Joyce <joycebrum@google.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
|
|
* Add openchat chat template
* Add chat template test for openchat
* Add chat template for vicuna
* Add chat template for orca-vicuna
* Add EOS for vicuna templates
* Combine vicuna chat templates
* Add tests for openchat and vicuna chat templates
* Add chat template for alpaca
* Add separate template name for vicuna-orca
* Remove alpaca, match deepseek with jinja output
* Regenerate chat template test with add_generation_prompt
* Separate deepseek bos from system message
* Match openchat template with jinja output
* Remove BOS token from templates, unprefix openchat
|
|
|
|
* ggml : update mul_mat_id to use the same tensor for all the experts
* update cuda
* minor
* update metal
* update test-backend-ops
* fix cuda
* Update ggml-metal.m
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* update convert.py
* update convert-hf-to-gguf.py
* update convert.py for mixtral hf models
* Update convert-hf-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* cuda : support non-pow-2 number of experts
* allow quantize to work for split and merged experts models in the same way
* cleanup + disable mmap automatically with split tensors models
* update imatrix
* test-backend-ops : test qwen argsort
* update grok model loading
* llama : add merged experts tensors to the grok tensor map
* minor
* gguf : bump version
* fix quantizing of merged experts
* convert-hf-to-gguf.py : update grok (untested)
* make linter happy
* cuda/argsort : use shared memory instead of pool memory
* convert : fix grok tensor names
* metal : add support for non-pow-2 argsort
* llama : more loader cleanup, better error checking
* cuda : fix warning
* llama : still use mmap for loading old models, but copy the data to a host buffer
* add review note
* llama : remove ffn tensor counting + add sanity check
ggml-ci
* convert : fix handling of n_experts == None
ggml-ci
* imatrix : fix ncall counters
* llama : produce error if imatrix size does not match
* quantize : terminate on errors + trace logs
ggml-ci
* metal : pad shared memory to 16 bytes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* disable iqx on windows as WA
* array instead of global_memory
|
|
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23)
→ 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
|
|
|
* ci: server: verify deps are coherent with the commit
* ci: server: change the ref to build as now it's a pull event target
|
|
|
|
|
|
* fixed deprecated address
* fixed deprecated address
* fixed deprecated address
* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions
* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions
* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions
* reverted back to only the MIT license
|
|
* split by max size
* clean up arg parse
* split: ok
* add dry run option
* error on 0 tensors
* be positive
* remove next_metadata_size
|
|
* Fix Vulkan no kv offload incoherence
* Add k-quant mul mat mat shaders
* Rework working buffer allocation, reduces vram use noticeably
Clean up cpu assist code, replaced with ggml-backend offload function
* Default to all dedicated GPUs
* Add fallback for integrated GPUs if no dedicated GPUs are found
* Add debug info which device is allocating memory
* Fix Intel dequant issue
Fix validation issue
* Fix Vulkan GGML_OP_GET_ROWS implementation
* Clean up merge artifacts
* Remove Vulkan warning
|
|
* sync : ggml
ggml-ci
* cuda : move GGML_CUDA_DMMV constants to dmmv.cuh
---------
Co-authored-by: slaren <slarengh@gmail.com>
|
|
* Support xverse model convert to gguf format.
* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;
* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter
* llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers.
* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv
* Update llama.cpp
---------
Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>
|
|
ggml-ci
|
|
* readme: add Android UI binding
* Update README.md
|
|
* cmake: add explicit metal version options
* Update CMakeLists.txt
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* llama: remove redundant reshape in build_kv_store
This commit removes the reshape of the V matrix in the build_kv_store.
The motivation for this is that V matrix has the shape:
```console
(gdb) p *v_cur
$46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU,
buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608,
8388608}, op = GGML_OP_MUL_MAT, op_params = {
0 <repeats 16 times>}, flags = 0, grad = 0x0,
src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0,
view_src = 0x0, view_offs = 0, data = 0x0,
name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0,
padding = "\000\000\000\000\000\000\000"}
```
And after reshaping this tensor we get:
```console
gdb) p *ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens)
$44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU,
buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608,
8388608}, op = GGML_OP_RESHAPE, op_params = {
0 <repeats 16 times>}, flags = 0, grad = 0x0,
src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0,
view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0,
name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0,
padding = "\000\000\000\000\000\000\000"}
```
I noticed that the `src` and `view_src` fields are different but that the
dimensions are the same. From the code comment it seems like the reshape
call is not needed and perhaps the above can motivate the removal of the
reshape call.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* llama : add assert
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Allow conversion of Mistral HF models
* Homogenize Llama, Mistral, Mixtral under the same entry.
* Fix tokenizer, permute tensors
* Use sentencepiece tokenizer, or fall back to hfft.
* convert-hf : small fix for mypy
* convert-hf : fix duplicated block_count
* convert-hf : add vocab size to metadata
---------
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
|
|
|
|
* Revisited & updated SYCL build documentation
* removed outdated comment
* Addressed PR comments
* Trimed white spaces
* added new end line
|
|
|
|
* fix empty bug
* Update MobileVLM-README.md
added more results on devices
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update MobileVLM-README.md
remove gguf links
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
repo (#6365)
|
|
* doc: fix outdated default value of batch size
* doc: add doc for ubatch-size
|
|
|
|
|
|
|
|
|
|
|
|
Take all dependencies from the cross stage, rather tha only stdenv
|
|
- The generic /usr/bin/env shebangs are good enough
- Python deps are provisioned in the devShells
- We need to be able to leave python out at least on windows (currently breaks eval)
|
|
initial nix build for windows using zig
mingwW64 build
removes nix zig windows build
removes nix zig windows build
removed unnessesary glibc.static
removed unnessesary import of pkgs in nix
fixed missing trailing newline on non-windows nix builds
overriding stdenv when building for crosscompiling to windows in nix
better variables when crosscompiling windows in nix
cross compile windows on macos
removed trailing whitespace
remove unnessesary overwrite of "CMAKE_SYSTEM_NAME" in nix windows build
nix: keep file extension when copying result files during cross compile for windows
nix: better checking for file extensions when using MinGW
nix: using hostPlatform instead of targetPlatform when cross compiling for Windows
using hostPlatform.extensions.executable to extract executable format
|
|
|
|
|
|
* server: bench: init
* server: bench: reduce list of GPU nodes
* server: bench: fix graph, fix output artifact
* ci: bench: add mermaid in case of image cannot be uploaded
* ci: bench: more resilient, more metrics
* ci: bench: trigger build
* ci: bench: fix duration
* ci: bench: fix typo
* ci: bench: fix mermaid values, markdown generated
* typo on the step name
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* ci: bench: trailing spaces
* ci: bench: move images in a details section
* ci: bench: reduce bullet point size
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
|
|
Until https://github.com/ggerganov/llama.cpp/issues/6346 is resolved
|
|
|
|
|