ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2025-06-03	Adding top-n-sigma sampler (#489)	Kawrakow
	* Adding top-n-sigma sampler * Fix typos in XTC PR * Update README.md for main and server * More README * More README --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-06-03	Adding the XTC sampler (#486)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-05-22	common : normalize naming style (#7462)	Georgi Gerganov
	* common : normalize naming style ggml-ci * common : match declaration / definition order * zig : try to fix build
2024-05-11	server: fix reported top tokens for temperature 0 (#7203)	Johannes Gäßler

2024-05-07	server: fix incorrectly reported token probabilities (#7125)	Johannes Gäßler
	* server: normalize token probabilities * fix temperature == 0.0f
2024-04-24	Server: fix seed for multiple slots (#6835)	Johannes Gäßler
	* Server: add tests for consistent results * sampling: separate rng per sampling context
2024-04-08	llama : support negative ith in llama_get_ API (#6519)	Rick G
	* llama_sampling_sample with default args is more naively usable * Batches populated by either llama_batch_get_one or llama_batch_add work with default args * Previously get_one could use the default argument * Previously add should usually have used the last index where logits[idx] == true * This hopefully encourages the use of llama_batch_add * By giving expected results when using default arguments. * Adds "negative indexing" feature to llama_get_logits_ith and llama_get_embeddings_ith * Believed to work with any currently well behaved program * Default arg now works for both cases (previously would give strange results for add case) * Any non-negative number is unaffected and behaves as previously * Negative arguments were previously invalid. * Implemented as a special case of indexing as suggested by @compilade in https://github.com/ggerganov/llama.cpp/pull/6519 * Fixed mismatch type errors * cited in macOS CI tests * Missed in original updates based on PR feedback in https://github.com/ggerganov/llama.cpp/pull/6519
2024-03-24	sampling : deduplicated code for probability distribution access (#6240)	Minsoo Cheong
	* sampling: remove duplicated code for probability distribution access * free original_logits * fix original_logits allocation * fixes based on review @cebtenzzre * change function name to `llama_sampling_prepare`
2024-03-19	common : disable repeat penalties by default (#6127)	Georgi Gerganov

2024-03-04	speculative : implement stochastic speculative sampling (#5625)	Minsoo Cheong
	* (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README
2024-02-18	common, server : surface min_keep as its own parameter (#5567)	Robey Holderith
	* Feature - surface min_keep as its own parameter * Updated README with min_keep param
2024-02-16	server : add "samplers" param to control the samplers order (#5494)	Alexey Parfenov

2024-02-11	common : use enums for sampler types (#5418)	Alexey Parfenov
	* common: use enums for sampler types * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * minor : spaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-25	llama : dynamic temperature sampling (#4972)	l3utterfly
	* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-17	llama : fix copy/paste error in llama_sampling_params comment (#4994)	David Renshaw

2023-12-23	server : allow to specify custom prompt for penalty calculation (#3727)	Alexey Parfenov

2023-12-05	sampling : custom samplers order (#4285)	MaggotHATE
	* Samplers sequence order w parameter * Cleaned commented code * Fixed formatting * Rewrote with unordered_map * Revert and rewrite, too many problems and safeguards would be needed * Fixed code style * Code style fixes according to review * More readable samplers input string, fixed help * Style fix in sampler_queue * Formatting fixes * Fixing whitespaces
2023-10-31	samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841)	kalomaze
	* Introduce the new Min-P sampler by @kalomaze The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. * Min-P enabled and set to 0.05 default --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-10-20	sampling : refactor init to use llama_sampling_params (#3696)	Georgi Gerganov
	* sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci
2023-10-18	speculative : add tree-based sampling example (#3624)	Georgi Gerganov
	* sampling : one sequence per sampling context ggml-ci * speculative : add tree-based sampling support ggml-ci * speculative : reuse the n_parallel CLI param * speculative : refactor sampling * examples : fix build after sampling refactoring ggml-ci * batched : fix n_seq_id * sampling : fix malloc ggml-ci * swift : fix build ggml-ci * swift : try to fix build ggml-ci * prompts : add assistant.txt * common : add llama_batch_add() and llama_batch_clear() helpers * speculative : minor refactor ggml-ci * minor : comments + rename ggml-ci * speculative : fix off-by-one for n_drafted * speculative : fix the n_drafted fix + p constants
2023-10-11	common : fix mirostat state when using multiple sequences (#3543)	Kerfuffle
	* Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts