diff options
author | Ebey Abraham <ebey97@gmail.com> | 2023-12-18 17:27:47 +0000 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-18 19:27:47 +0200 |
commit | b9e74f9bca5fdf7d0a22ed25e7a9626335fdfa48 (patch) | |
tree | b150a0d4490627bfc9cdd758d08d026fc70b0882 /ggml.h | |
parent | 3c04bf6da89eaf4c7d317e0518f0687dfcbf2de7 (diff) |
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)
* phi2 implementation
* fix breaking change
* phi-2 : various fixes
* phi-2 : use layer norm eps
* py : whitespaces
* llama : fix meta KV override bug
* convert : phi don't add BOS token
* convert : revert "added_tokens_decoder" change
* phi-2 : scale Q instead of KQ for better precision
* ggml : fix NeoX rope to rotate just first n_dims
* cuda : less diff in the rope_neox kernel
* ggml : add ggml_mul_mat_set_prec
ggml-ci
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* Update ggml-cuda.cu
Co-authored-by: slaren <slarengh@gmail.com>
* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision
* cuda : remove oboslete comment
---------
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
Diffstat (limited to 'ggml.h')
-rw-r--r-- | ggml.h | 12 |
1 files changed, 12 insertions, 0 deletions
@@ -343,6 +343,12 @@ extern "C" { GGML_TYPE_COUNT, }; + // precision + enum ggml_prec { + GGML_PREC_DEFAULT, + GGML_PREC_F32, + }; + enum ggml_backend_type { GGML_BACKEND_CPU = 0, GGML_BACKEND_GPU = 10, @@ -1057,6 +1063,12 @@ extern "C" { struct ggml_tensor * a, struct ggml_tensor * b); + // change the precision of a matrix multiplication + // set to GGML_PREC_F32 for higher precision (useful for phi-2) + GGML_API void ggml_mul_mat_set_prec( + struct ggml_tensor * a, + enum ggml_prec prec); + // indirect matrix multiplication // ggml_mul_mat_id(ctx, as, ids, id, b) ~= ggml_mul_mat(as[ids[id]], b) GGML_API struct ggml_tensor * ggml_mul_mat_id( |