speculative : implement stochastic speculative sampling (#5625)

* (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README
author: Minsoo Cheong <54794500+mscheong01@users.noreply.github.com> 2024-03-05 03:24:00 +0900
committer: GitHub <noreply@github.com> 2024-03-04 20:24:00 +0200
commit: 6d341ab6c53cd51f2921d986d0090cc8b049b39a (patch)
tree: f212b497e210c8c73fe52369f6bc81297c7b1dab /common/sampling.h
parent: 4ffcdce2ff877ebb683cd217ea38faf20faa5ffe (diff)
1 files changed, 7 insertions, 0 deletions
diff --git a/common/sampling.h b/common/sampling.h
index 95d87539..48b2459d 100644
--- a/common/sampling.h
+++ b/common/sampling.h
@@ -131,6 +131,13 @@ llama_token llama_sampling_sample(
         struct llama_context * ctx_cfg,
         int idx = 0);
 
+// returns the probability that token of given id will be sampled
+llama_token_data_array llama_sampling_probability_distribution(
+        struct llama_sampling_context * ctx_sampling,
+        struct llama_context * ctx_main,
+        struct llama_context * ctx_cfg,
+        int idx = 0);
+
 void llama_sampling_accept(
         struct llama_sampling_context * ctx_sampling,
         struct llama_context * ctx_main,
author	Minsoo Cheong <54794500+mscheong01@users.noreply.github.com>	2024-03-05 03:24:00 +0900
committer	GitHub <noreply@github.com>	2024-03-04 20:24:00 +0200
commit	6d341ab6c53cd51f2921d986d0090cc8b049b39a (patch)
tree	f212b497e210c8c73fe52369f6bc81297c7b1dab /common/sampling.h
parent	4ffcdce2ff877ebb683cd217ea38faf20faa5ffe (diff)