summaryrefslogtreecommitdiff
path: root/examples/quantize/README.md
diff options
context:
space:
mode:
authorDavid Friehs <david@friehs.info>2024-01-13 17:29:43 +0100
committerGitHub <noreply@github.com>2024-01-13 18:29:43 +0200
commitdf845cc982e7e2ea7b9900e29d55b15338faa78d (patch)
tree07c1eb5f5b9a3ac21fa70e499029907d9d90b008 /examples/quantize/README.md
parent6b48ed089377330cdb362970a51c1c89b6d857a8 (diff)
llama : minimize size used for state save/load (#4820)
* examples : save-load-state: save only required state * llama : only reserve n_vocab * n_batch at most for logits llama_decode asserts that only n_batch tokens are passed each call, and n_ctx is expected to be bigger than n_batch. * llama : always reserve n_vocab * n_batch for logits llama_context de-serialization breaks if the contexts have differing capacity for logits and llama_decode will at maximum resize to n_vocab * n_batch. * llama : only save and restore used logits for batch sizes of 512 this reduces save state in the best case by around 62 MB, which can be a lot if planning to save on each message to allow regenerating messages. * llama : use ostringstream and istringstream for save and load * llama : serialize rng into minimum amount of space required * llama : break session version due to serialization changes
Diffstat (limited to 'examples/quantize/README.md')
0 files changed, 0 insertions, 0 deletions