diff options
author | Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> | 2024-01-28 21:26:23 +0530 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-01-28 17:56:23 +0200 |
commit | 0f648573dde61c510560f68244f70ece7e60d8c1 (patch) | |
tree | fc8d92d8b588b1a668b48b8a66ff15e5898d0828 /examples/sycl | |
parent | b764b8f1d079ba44d912801ce6d29bd0d94d51cf (diff) |
ggml : add unified SYCL backend for Intel GPUs (#2690)
* first update for migration
* update init_cublas
* add debug functio, commit all help code
* step 1
* step 2
* step3 add fp16, slower 31->28
* add GGML_LIST_DEVICE function
* step 5 format device and print
* step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue
* support main device is non-zero
* step7 add debug for code path, rm log
* step 8, rename all macro & func from cuda by sycl
* fix error of select non-zero device, format device list
* ren ggml-sycl.hpp -> ggml-sycl.h
* clear CMAKE to rm unused lib and options
* correct queue: rm dtct:get_queue
* add print tensor function to debug
* fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481
* summary dpct definition in one header file to replace folder:dpct
* refactor device log
* mv dpct definition from folder dpct to ggml-sycl.h
* update readme, refactor build script
* fix build with sycl
* set nthread=1 when sycl, increase performance
* add run script, comment debug code
* add ls-sycl-device tool
* add ls-sycl-device, rm unused files
* rm rear space
* dos2unix
* Update README_sycl.md
* fix return type
* remove sycl version from include path
* restore rm code to fix hang issue
* add syc and link for sycl readme
* rm original sycl code before refactor
* fix code err
* add know issue for pvc hang issue
* enable SYCL_F16 support
* align pr4766
* check for sycl blas, better performance
* cleanup 1
* remove extra endif
* add build&run script, clean CMakefile, update guide by review comments
* rename macro to intel hardware
* editor config format
* format fixes
* format fixes
* editor format fix
* Remove unused headers
* skip build sycl tool for other code path
* replace tab by space
* fix blas matmul function
* fix mac build
* restore hip dependency
* fix conflict
* ren as review comments
* mv internal function to .cpp file
* export funciton print_sycl_devices(), mv class dpct definition to source file
* update CI/action for sycl code, fix CI error of repeat/dup
* fix action ID format issue
* rm unused strategy
* enable llama_f16 in ci
* fix conflict
* fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml
* fix ci cases for unsupported data type
* revert unrelated changed in cuda cmake
remove useless nommq
fix typo of GGML_USE_CLBLAS_SYCL
* revert hip cmake changes
* fix indent
* add prefix in func name
* revert no mmq
* rm cpu blas duplicate
* fix no_new_line
* fix src1->type==F16 bug.
* pass batch offset for F16 src1
* fix batch error
* fix wrong code
* revert sycl checking in test-sampling
* pass void as arguments of ggml_backend_sycl_print_sycl_devices
* remove extra blank line in test-sampling
* revert setting n_threads in sycl
* implement std::isinf for icpx with fast math.
* Update ci/run.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/sycl/run-llama2.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/sycl/run-llama2.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update CMakeLists.txt
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add copyright and MIT license declare
* update the cmd example
---------
Co-authored-by: jianyuzh <jianyu.zhang@intel.com>
Co-authored-by: luoyu-intel <yu.luo@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/sycl')
-rw-r--r-- | examples/sycl/CMakeLists.txt | 9 | ||||
-rw-r--r-- | examples/sycl/README.md | 47 | ||||
-rwxr-xr-x | examples/sycl/build.sh | 20 | ||||
-rw-r--r-- | examples/sycl/ls-sycl-device.cpp | 11 | ||||
-rwxr-xr-x | examples/sycl/run-llama2.sh | 19 |
5 files changed, 106 insertions, 0 deletions
diff --git a/examples/sycl/CMakeLists.txt b/examples/sycl/CMakeLists.txt new file mode 100644 index 00000000..69cf8932 --- /dev/null +++ b/examples/sycl/CMakeLists.txt @@ -0,0 +1,9 @@ +# MIT license +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: MIT + +set(TARGET ls-sycl-device) +add_executable(${TARGET} ls-sycl-device.cpp) +install(TARGETS ${TARGET} RUNTIME) +target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT}) +target_compile_features(${TARGET} PRIVATE cxx_std_17) diff --git a/examples/sycl/README.md b/examples/sycl/README.md new file mode 100644 index 00000000..b46f17f3 --- /dev/null +++ b/examples/sycl/README.md @@ -0,0 +1,47 @@ +# llama.cpp/example/sycl + +This example program provide the tools for llama.cpp for SYCL on Intel GPU. + +## Tool + +|Tool Name| Function|Status| +|-|-|-| +|ls-sycl-device| List all SYCL devices with ID, compute capability, max work group size, ect.|Support| + +### ls-sycl-device + +List all SYCL devices with ID, compute capability, max work group size, ect. + +1. Build the llama.cpp for SYCL for all targets. + +2. Enable oneAPI running environment + +``` +source /opt/intel/oneapi/setvars.sh +``` + +3. Execute + +``` +./build/bin/ls-sycl-device +``` + +Check the ID in startup log, like: + +``` +found 4 SYCL devices: + Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3, + max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 + Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2, + max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280 + Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0, + max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280 + Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0, + max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 + +``` + +|Attribute|Note| +|-|-| +|compute capability 1.3|Level-zero running time, recommended | +|compute capability 3.0|OpenCL running time, slower than level-zero in most cases| diff --git a/examples/sycl/build.sh b/examples/sycl/build.sh new file mode 100755 index 00000000..26ad2f7d --- /dev/null +++ b/examples/sycl/build.sh @@ -0,0 +1,20 @@ + +# MIT license +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: MIT + +mkdir -p build +cd build +source /opt/intel/oneapi/setvars.sh + +#for FP16 +#cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL_F16=ON # faster for long-prompt inference + +#for FP32 +cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx + +#build example/main only +#cmake --build . --config Release --target main + +#build all binary +cmake --build . --config Release -v diff --git a/examples/sycl/ls-sycl-device.cpp b/examples/sycl/ls-sycl-device.cpp new file mode 100644 index 00000000..42847154 --- /dev/null +++ b/examples/sycl/ls-sycl-device.cpp @@ -0,0 +1,11 @@ +/*MIT license + Copyright (C) 2024 Intel Corporation + SPDX-License-Identifier: MIT +*/ + +#include "ggml-sycl.h" + +int main(int argc, char ** argv) { + ggml_backend_sycl_print_sycl_devices(); + return 0; +} diff --git a/examples/sycl/run-llama2.sh b/examples/sycl/run-llama2.sh new file mode 100755 index 00000000..f5f4c1e9 --- /dev/null +++ b/examples/sycl/run-llama2.sh @@ -0,0 +1,19 @@ +#!/bin/bash + +# MIT license +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: MIT + +INPUT2="Building a website can be done in 10 simple steps:\nStep 1:" +source /opt/intel/oneapi/setvars.sh + +if [ $# -gt 0 ]; then + export GGML_SYCL_DEVICE=$1 +else + export GGML_SYCL_DEVICE=0 +fi +echo GGML_SYCL_DEVICE=$GGML_SYCL_DEVICE +#export GGML_SYCL_DEBUG=1 +./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0 +#./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 5 -e -ngl 33 -t 1 -s 0 + |