llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)

* phi2 implementation * fix breaking change * phi-2 : various fixes * phi-2 : use layer norm eps * py : whitespaces * llama : fix meta KV override bug * convert : phi don't add BOS token * convert : revert "added_tokens_decoder" change * phi-2 : scale Q instead of KQ for better precision * ggml : fix NeoX rope to rotate just first n_dims * cuda : less diff in the rope_neox kernel * ggml : add ggml_mul_mat_set_prec ggml-ci * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * cuda : ggml_cuda_op_mul_mat_cublas support F32 precision * cuda : remove oboslete comment --------- Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
author: Ebey Abraham <ebey97@gmail.com> 2023-12-18 17:27:47 +0000
committer: GitHub <noreply@github.com> 2023-12-18 19:27:47 +0200
commit: b9e74f9bca5fdf7d0a22ed25e7a9626335fdfa48 (patch)
tree: b150a0d4490627bfc9cdd758d08d026fc70b0882 /ggml.h
parent: 3c04bf6da89eaf4c7d317e0518f0687dfcbf2de7 (diff)
1 files changed, 12 insertions, 0 deletions
diff --git a/ggml.h b/ggml.h
index 68f7833b..f1003984 100644
--- a/ggml.h
+++ b/ggml.h
@@ -343,6 +343,12 @@ extern "C" {
         GGML_TYPE_COUNT,
     };
 
+    // precision
+    enum ggml_prec {
+        GGML_PREC_DEFAULT,
+        GGML_PREC_F32,
+    };
+
     enum ggml_backend_type {
         GGML_BACKEND_CPU = 0,
         GGML_BACKEND_GPU = 10,
@@ -1057,6 +1063,12 @@ extern "C" {
             struct ggml_tensor  * a,
             struct ggml_tensor  * b);
 
+    // change the precision of a matrix multiplication
+    // set to GGML_PREC_F32 for higher precision (useful for phi-2)
+    GGML_API void ggml_mul_mat_set_prec(
+            struct ggml_tensor * a,
+            enum ggml_prec       prec);
+
     // indirect matrix multiplication
     //  ggml_mul_mat_id(ctx, as, ids, id, b) ~= ggml_mul_mat(as[ids[id]], b)
     GGML_API struct ggml_tensor * ggml_mul_mat_id(
author	Ebey Abraham <ebey97@gmail.com>	2023-12-18 17:27:47 +0000
committer	GitHub <noreply@github.com>	2023-12-18 19:27:47 +0200
commit	b9e74f9bca5fdf7d0a22ed25e7a9626335fdfa48 (patch)
tree	b150a0d4490627bfc9cdd758d08d026fc70b0882 /ggml.h
parent	3c04bf6da89eaf4c7d317e0518f0687dfcbf2de7 (diff)