diff options
author | DAN™ <dranger003@gmail.com> | 2024-03-10 11:56:30 -0400 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-03-10 17:56:30 +0200 |
commit | bcebd7dbf62fd7b293d5ed089023e4e733269c71 (patch) | |
tree | da8a1c4a76dfa9044a2bda8d1c58caaedd34bf4d /llama.h | |
parent | 2960eae847f8dbde23be6d170a61bcf44ebf32de (diff) |
llama : add support for GritLM (#5959)
* add gritlm example
* gritlm results match
* tabs to spaces
* comment out debug printing
* rebase to new embed
* gritlm embeddings are back babeee
* add to gitignore
* allow to toggle embedding mode
* Clean-up GritLM sample code.
* Fix types.
* Flush stdout and output ending newline if streaming.
* mostly style fixes; correct KQ_mask comment
* add causal_attn flag to llama_cparams
* gritml : minor
* llama : minor
---------
Co-authored-by: Douglas Hanley <thesecretaryofwar@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'llama.h')
-rw-r--r-- | llama.h | 4 |
1 files changed, 4 insertions, 0 deletions
@@ -643,6 +643,10 @@ extern "C" { // n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens) LLAMA_API void llama_set_n_threads(struct llama_context * ctx, uint32_t n_threads, uint32_t n_threads_batch); + // Set whether to use causal attention or not + // If set to true, the model will only attend to the past tokens + LLAMA_API void llama_set_causal_attn(struct llama_context * ctx, bool causal_attn); + // Set abort callback LLAMA_API void llama_set_abort_callback(struct llama_context * ctx, ggml_abort_callback abort_callback, void * abort_callback_data); |