ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	David Friehs <david@friehs.info>	2024-01-13 17:29:43 +0100
committer	GitHub <noreply@github.com>	2024-01-13 18:29:43 +0200
commit	df845cc982e7e2ea7b9900e29d55b15338faa78d (patch)
tree	07c1eb5f5b9a3ac21fa70e499029907d9d90b008 /examples/quantize/README.md
parent	6b48ed089377330cdb362970a51c1c89b6d857a8 (diff)

llama : minimize size used for state save/load (#4820)

* examples : save-load-state: save only required state * llama : only reserve n_vocab * n_batch at most for logits llama_decode asserts that only n_batch tokens are passed each call, and n_ctx is expected to be bigger than n_batch. * llama : always reserve n_vocab * n_batch for logits llama_context de-serialization breaks if the contexts have differing capacity for logits and llama_decode will at maximum resize to n_vocab * n_batch. * llama : only save and restore used logits for batch sizes of 512 this reduces save state in the best case by around 62 MB, which can be a lot if planning to save on each message to allow regenerating messages. * llama : use ostringstream and istringstream for save and load * llama : serialize rng into minimum amount of space required * llama : break session version due to serialization changes

Diffstat (limited to 'examples/quantize/README.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: