Age | Commit message (Collapse) | Author |
|
|
|
Supercedes #4024 and #4813.
CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:
1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.
2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.
3. Updated the Nix recipe to account for these new changes.
4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.
5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.
The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.
~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.
Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.
Signed-off-by: Gavin Zhao <git@gzgz.dev>
|
|
...`) (#6964)
* readme: cmake . -B build && cmake --build build
* build: fix typo
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* build: drop implicit . from cmake config command
* build: remove another superfluous .
* build: update MinGW cmake commands
* Update README-sycl.md
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
* build: reinstate --config Release as not the default w/ some generators + document how to build Debug
* build: revert more --config Release
* build: nit / remove -H from cmake example
* build: reword debug instructions around single/multi config split
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
|
|
* server: add cURL support to `full.Dockerfile`
* server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile`
* server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile`
* server: add cURL support to `server-intel.Dockerfile`
* server: add cURL support to `server-vulkan.Dockerfile`
* fix typo in `server-vulkan.Dockerfile`
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* fixed deprecated address
* fixed deprecated address
* fixed deprecated address
* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions
* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions
* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions
* reverted back to only the MIT license
|
|
|
|
|
|
|
|
|
|
- The generic /usr/bin/env shebangs are good enough
- Python deps are provisioned in the devShells
- We need to be able to leave python out at least on windows (currently breaks eval)
|
|
initial nix build for windows using zig
mingwW64 build
removes nix zig windows build
removes nix zig windows build
removed unnessesary glibc.static
removed unnessesary import of pkgs in nix
fixed missing trailing newline on non-windows nix builds
overriding stdenv when building for crosscompiling to windows in nix
better variables when crosscompiling windows in nix
cross compile windows on macos
removed trailing whitespace
remove unnessesary overwrite of "CMAKE_SYSTEM_NAME" in nix windows build
nix: keep file extension when copying result files during cross compile for windows
nix: better checking for file extensions when using MinGW
nix: using hostPlatform instead of targetPlatform when cross compiling for Windows
using hostPlatform.extensions.executable to extract executable format
|
|
* Symlink to /usr/bin/xcrun so that `xcrun` binary
is usable during build (used for compiling Metal shaders)
Fixes https://github.com/ggerganov/llama.cpp/issues/6117
* cmake - copy default.metallib to install directory
When metal files are compiled to default.metallib, Cmake needs to add this to the install directory so that it's visible to llama-cpp
Also, update package.nix to use absolute path for default.metallib (it's not finding the bundle)
* add `precompileMetalShaders` flag (defaults to false) to disable precompilation of metal shader
Precompilation requires Xcode to be installed and requires disable sandbox on nix-darwin
|
|
|
|
Since no blas was provided to buildInputs, the executable is built without blas support.
This is a backport of NixOS/nixpkgs#298567
|
|
|
|
|
|
* build(nix): Introduce flake.formatter for `nix fmt`
* chore: Switch to pkgs.nixfmt-rfc-style
|
|
Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/)/[apptainer](https://apptainer.org/) and Docker images re-using llama.cpp's Nix expression.
Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.
|
|
|
|
|
|
* add vulkan dockerfile
* intel dockerfile: compile sycl by default
* fix vulkan dockerfile
* add docs for vulkan
* docs: sycl build in docker
* docs: remove trailing spaces
* docs: sycl: add docker section
* docs: clarify install vulkan SDK outside docker
* sycl: use intel/oneapi-basekit docker image
* docs: correct TOC
* docs: correct docker image for Intel oneMKL
|
|
* feat: add Dockerfiles for each platform that user ./server instead of ./main
* feat: update .github/workflows/docker.yml to build server-first docker containers
* doc: add information about running the server with Docker to README.md
* doc: add information about running with docker to the server README
* doc: update n-gpu-layers to show correct GPU usage
* fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA
|
|
thx to @SomeoneSerge for the suggestion!
|
|
this fixes the error I encountered when trying to run the convert.py
script in a venv:
```
$ nix develop
[...]$ source .venv/bin/activate
(.venv)
[...]$ pip3 install -r requirements.txt
<... clipped ...>
[...]$ python3 ./convert.py
Traceback (most recent call last):
File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module>
from sentencepiece import SentencePieceProcessor
File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module>
from . import _sentencepiece
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory
```
however, I am not sure this is the cleanest way to address this linker
issue...
|
|
Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
|
|
|
|
|
|
|
|
* llama : support StableLM 2 1.6B
* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]
* convert : refactor Qwen's set_vocab to use it for StableLM 2 too
* nix : add tiktoken to llama-python-extra
* convert : use presence of tokenizer.json to determine StableLM tokenizer loader
It's a less arbitrary heuristic than the vocab size.
|
|
|
|
betwen -> between
|
|
|
|
* python: add check-requirements.sh and GitHub workflow
This script and workflow forces package versions to remain compatible
across all convert*.py scripts, while allowing secondary convert scripts
to import dependencies not wanted in convert.py.
* Move requirements into ./requirements
* Fail on "==" being used for package requirements (but can be suppressed)
* Enforce "compatible release" syntax instead of ==
* Update workflow
* Add upper version bound for transformers and protobuf
* improve check-requirements.sh
* small syntax change
* don't remove venvs if nocleanup is passed
* See if this fixes docker workflow
* Move check-requirements.sh into ./scripts/
---------
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
|
|
* flake.lock: update to hotfix CUDA::cuda_driver
Required to support https://github.com/ggerganov/llama.cpp/pull/4606
* flake.nix: rewrite
1. Split into separate files per output.
2. Added overlays, so that this flake can be integrated into others.
The names in the overlay are `llama-cpp`, `llama-cpp-opencl`,
`llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the
broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs).
3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/)
rather than `with pkgs;` so that there's dependency injection rather
than dependency lookup.
4. Add a description and meta information for each package.
The description includes a bit about what's trying to accelerate each one.
5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge.
6. Format with `serokell/nixfmt` for a consistent style.
7. Update `flake.lock` with the latest goods.
* flake.nix: use finalPackage instead of passing it manually
* nix: unclutter darwin support
* nix: pass most darwin frameworks unconditionally
...for simplicity
* *.nix: nixfmt
nix shell github:piegamesde/nixfmt/rfc101-style --command \
nixfmt flake.nix .devops/nix/*.nix
* flake.nix: add maintainers
* nix: move meta down to follow Nixpkgs style more closely
* nix: add missing meta attributes
nix: clarify the interpretation of meta.maintainers
nix: clarify the meaning of "broken" and "badPlatforms"
nix: passthru: expose the use* flags for inspection
E.g.:
```
❯ nix eval .#cuda.useCuda
true
```
* flake.nix: avoid re-evaluating nixpkgs too many times
* flake.nix: use flake-parts
* nix: migrate to pname+version
* flake.nix: overlay: expose both the namespace and the default attribute
* ci: add the (Nix) flakestry workflow
* nix: cmakeFlags: explicit OFF bools
* nix: cuda: reduce runtime closure
* nix: fewer rebuilds
* nix: respect config.cudaCapabilities
* nix: add the impure driver's location to the DT_RUNPATHs
* nix: clean sources more thoroughly
...this way outPaths change less frequently,
and so there are fewer rebuilds
* nix: explicit mpi support
* nix: explicit jetson support
* flake.nix: darwin: only expose the default
---------
Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
|
|
|
|
* Added Cloud-V File
* Replaced Makefile with original one
---------
Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
|
|
|
|
* [Docker] fix tools.sh argument passing.
This should allow passing multiple arguments to containers with
the full image that are using the tools.sh frontend.
Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734
|
|
* Corrections and systemd units
* Missing dependency clblast
|
|
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP
---------
Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
|
|
* Create llama-cpp.srpm
* Rename llama-cpp.srpm to llama-cpp.srpm.spec
Correcting extension.
* Tested spec success.
* Update llama-cpp.srpm.spec
* Create lamma-cpp-cublas.srpm.spec
* Create lamma-cpp-clblast.srpm.spec
* Update lamma-cpp-cublas.srpm.spec
Added BuildRequires
* Moved to devops dir
|
|
This prevents accidentally expanding arguments that contain spaces.
|
|
|
|
Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Modify Dockerfile default character set to improve compatibility (#1673)
|
|
Deprecation disclaimer was added to convert-pth-to-ggml.py
|
|
Git added to build packages for version information in docker image
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
|
|
instead of `int` (while `int` option still being supported)
This allows the following usage:
`./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0`
instead of:
`./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
|
|
after #545 we do not need torch, tqdm and requests in the dependencies
|