Add embedding mode with arg flag. Currently working (#282)

* working but ugly * add arg flag, not working on embedding mode * typo * Working! Thanks to @nullhook * make params argument instead of hardcoded boolean. remove useless time check * start doing the instructions but not finished. This probably doesnt compile * Embeddings extraction support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Luciano <lucianostrika44@gmail.com> 2023-03-24 08:05:13 -0700
committer: GitHub <noreply@github.com> 2023-03-24 17:05:13 +0200
commit: 8d4a855c241ecb0f3ddc03447fe56002ebf27a37 (patch)
tree: 4de329fb2849fb6128d05237850b8ceb7519bf36 /llama.h
parent: b6b268d4415fd3b3e53f22b6619b724d4928f713 (diff)
1 files changed, 5 insertions, 0 deletions
diff --git a/llama.h b/llama.h
index 3df9ed1f..209b4dbe 100644
--- a/llama.h
+++ b/llama.h
@@ -53,6 +53,7 @@ extern "C" {
         bool f16_kv;     // use fp16 for KV cache
         bool logits_all; // the llama_eval() call computes all logits, not just the last one
         bool vocab_only; // only load the vocabulary, no weights
+        bool embedding;  // embedding mode only
     };
 
     LLAMA_API struct llama_context_params llama_context_default_params();
@@ -108,6 +109,10 @@ extern "C" {
     // Cols: n_vocab
     LLAMA_API float * llama_get_logits(struct llama_context * ctx);
 
+    // Get the embeddings for the input
+    // shape: [n_embd] (1-dimensional)
+    LLAMA_API float * llama_get_embeddings(struct llama_context * ctx);
+
     // Token Id -> String. Uses the vocabulary in the provided context
     LLAMA_API const char * llama_token_to_str(struct llama_context * ctx, llama_token token);
author	Luciano <lucianostrika44@gmail.com>	2023-03-24 08:05:13 -0700
committer	GitHub <noreply@github.com>	2023-03-24 17:05:13 +0200
commit	8d4a855c241ecb0f3ddc03447fe56002ebf27a37 (patch)
tree	4de329fb2849fb6128d05237850b8ceb7519bf36 /llama.h
parent	b6b268d4415fd3b3e53f22b6619b724d4928f713 (diff)