sampling : refactor init to use llama_sampling_params (#3696)

* sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci
author: Georgi Gerganov <ggerganov@gmail.com> 2023-10-20 21:07:23 +0300
committer: GitHub <noreply@github.com> 2023-10-20 21:07:23 +0300
commit: d1031cf49c3b958b915fd558e23453471c29ac33 (patch)
tree: 14fa2bc6d54d5e27bd1e8bfd6fa4dbf894dbe6b9 /README.md
parent: 8cf19d60dc93809db8e51fedc811595eed9134c5 (diff)
1 files changed, 0 insertions, 1 deletions
diff --git a/README.md b/README.md
index ce63c6f0..49bb556a 100644
--- a/README.md
+++ b/README.md
@@ -962,7 +962,6 @@ docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /
 
 - [main](./examples/main/README.md)
 - [server](./examples/server/README.md)
-- [embd-input](./examples/embd-input/README.md)
 - [jeopardy](./examples/jeopardy/README.md)
 - [BLIS](./docs/BLIS.md)
 - [Performance troubleshooting](./docs/token_generation_performance_tips.md)
author	Georgi Gerganov <ggerganov@gmail.com>	2023-10-20 21:07:23 +0300
committer	GitHub <noreply@github.com>	2023-10-20 21:07:23 +0300
commit	d1031cf49c3b958b915fd558e23453471c29ac33 (patch)
tree	14fa2bc6d54d5e27bd1e8bfd6fa4dbf894dbe6b9 /README.md
parent	8cf19d60dc93809db8e51fedc811595eed9134c5 (diff)