llama : add phi3 128K model support (#7225)

* add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: liuwei-git <14815172+liuwei-git@users.noreply.github.com> 2024-05-22 04:28:32 +0800
committer: GitHub <noreply@github.com> 2024-05-21 23:28:32 +0300
commit: 201cc11afa0a1950e1f632390b2ac6c937a0d8f0 (patch)
tree: 440fb7ecd80b48772a955a80855db29677d172a2 /ggml-sycl.cpp
parent: 6369bf04336ab60e5c892dd77a3246df91015147 (diff)
1 files changed, 3 insertions, 0 deletions
diff --git a/ggml-sycl.cpp b/ggml-sycl.cpp
index eac8f557..f486b6c0 100644
--- a/ggml-sycl.cpp
+++ b/ggml-sycl.cpp
@@ -14454,6 +14454,9 @@ inline void ggml_sycl_op_rope(const ggml_tensor *src0, const ggml_tensor *src1,
                               ggml_tensor *dst, const float *src0_dd,
                               const float *src1_dd, float *dst_dd,
                               const dpct::queue_ptr &main_stream) {
+#pragma message("TODO: implement phi3 frequency factors support")
+#pragma message("      https://github.com/ggerganov/llama.cpp/pull/7225")
+    GGML_ASSERT(dst->src[2] == nullptr && "phi3 frequency factors not implemented yet");
 
     GGML_ASSERT(src0->type == GGML_TYPE_F32 || src0->type == GGML_TYPE_F16);
     GGML_ASSERT( dst->type == GGML_TYPE_F32 ||  dst->type == GGML_TYPE_F16);
author	liuwei-git <14815172+liuwei-git@users.noreply.github.com>	2024-05-22 04:28:32 +0800
committer	GitHub <noreply@github.com>	2024-05-21 23:28:32 +0300
commit	201cc11afa0a1950e1f632390b2ac6c937a0d8f0 (patch)
tree	440fb7ecd80b48772a955a80855db29677d172a2 /ggml-sycl.cpp
parent	6369bf04336ab60e5c892dd77a3246df91015147 (diff)