summaryrefslogtreecommitdiff
path: root/common/sampling.h
AgeCommit message (Collapse)Author
2025-06-03Adding top-n-sigma sampler (#489)Kawrakow
* Adding top-n-sigma sampler * Fix typos in XTC PR * Update README.md for main and server * More README * More README --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-06-03Adding the XTC sampler (#486)Kawrakow
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-05-22common : normalize naming style (#7462)Georgi Gerganov
* common : normalize naming style ggml-ci * common : match declaration / definition order * zig : try to fix build
2024-05-11server: fix reported top tokens for temperature 0 (#7203)Johannes Gäßler
2024-05-07server: fix incorrectly reported token probabilities (#7125)Johannes Gäßler
* server: normalize token probabilities * fix temperature == 0.0f
2024-04-24Server: fix seed for multiple slots (#6835)Johannes Gäßler
* Server: add tests for consistent results * sampling: separate rng per sampling context
2024-04-08llama : support negative ith in llama_get_ API (#6519)Rick G
* llama_sampling_sample with default args is more naively usable * Batches populated by either llama_batch_get_one or llama_batch_add work with default args * Previously get_one could use the default argument * Previously add should usually have used the last index where logits[idx] == true * This hopefully encourages the use of llama_batch_add * By giving expected results when using default arguments. * Adds "negative indexing" feature to llama_get_logits_ith and llama_get_embeddings_ith * Believed to work with any currently well behaved program * Default arg now works for both cases (previously would give strange results for add case) * Any non-negative number is unaffected and behaves as previously * Negative arguments were previously invalid. * Implemented as a special case of indexing as suggested by @compilade in https://github.com/ggerganov/llama.cpp/pull/6519 * Fixed mismatch type errors * cited in macOS CI tests * Missed in original updates based on PR feedback in https://github.com/ggerganov/llama.cpp/pull/6519
2024-03-24sampling : deduplicated code for probability distribution access (#6240)Minsoo Cheong
* sampling: remove duplicated code for probability distribution access * free original_logits * fix original_logits allocation * fixes based on review @cebtenzzre * change function name to `llama_sampling_prepare`
2024-03-19common : disable repeat penalties by default (#6127)Georgi Gerganov
2024-03-04speculative : implement stochastic speculative sampling (#5625)Minsoo Cheong
* (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README
2024-02-18common, server : surface min_keep as its own parameter (#5567)Robey Holderith
* Feature - surface min_keep as its own parameter * Updated README with min_keep param
2024-02-16server : add "samplers" param to control the samplers order (#5494)Alexey Parfenov
2024-02-11common : use enums for sampler types (#5418)Alexey Parfenov
* common: use enums for sampler types * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * minor : spaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-25llama : dynamic temperature sampling (#4972)l3utterfly
* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-17llama : fix copy/paste error in llama_sampling_params comment (#4994)David Renshaw
2023-12-23server : allow to specify custom prompt for penalty calculation (#3727)Alexey Parfenov
2023-12-05sampling : custom samplers order (#4285)MaggotHATE
* Samplers sequence order w parameter * Cleaned commented code * Fixed formatting * Rewrote with unordered_map * Revert and rewrite, too many problems and safeguards would be needed * Fixed code style * Code style fixes according to review * More readable samplers input string, fixed help * Style fix in sampler_queue * Formatting fixes * Fixing whitespaces
2023-10-31samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841)kalomaze
* Introduce the new Min-P sampler by @kalomaze The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token. * Min-P enabled and set to 0.05 default --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-10-20sampling : refactor init to use llama_sampling_params (#3696)Georgi Gerganov
* sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci
2023-10-18speculative : add tree-based sampling example (#3624)Georgi Gerganov
* sampling : one sequence per sampling context ggml-ci * speculative : add tree-based sampling support ggml-ci * speculative : reuse the n_parallel CLI param * speculative : refactor sampling * examples : fix build after sampling refactoring ggml-ci * batched : fix n_seq_id * sampling : fix malloc ggml-ci * swift : fix build ggml-ci * swift : try to fix build ggml-ci * prompts : add assistant.txt * common : add llama_batch_add() and llama_batch_clear() helpers * speculative : minor refactor ggml-ci * minor : comments + rename ggml-ci * speculative : fix off-by-one for n_drafted * speculative : fix the n_drafted fix + p constants
2023-10-11common : fix mirostat state when using multiple sequences (#3543)Kerfuffle
* Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts