ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>	2023-08-30 02:25:50 -0600
committer	GitHub <noreply@github.com>	2023-08-30 11:25:50 +0300
commit	dc07dc492ef9640bbb82904d7c7679f7bdcf6d76 (patch)
tree	f9d80bc6ee29067e8e72521d75dfa2b92d85540e /llama.cpp
parent	ad9ddcff6ef322db5cf13785bd7c856b610d242e (diff)

convert : various script cleanups/fixes + merges and special token handling (#2842)

* convert: Fix permute calls and method/func definitions * Cleanups for gguf-py * Minor types cleanups. * Initial implementation of handling merges and special tokens * convert: Handle special tokens and merges in vocab only mode convert: Vocab only mode no longer requires loading model tensors * gguf: Refactor tensor name mapping * convert: Fix type hint for special_token_types in SpecialVocab * Use common special vocab handling in various conversion scripts * First pass at implementing suggested changes * Second pass * gguf: SpecialVocab: Fix issue with special token content not in a dict gguf: SpecialVocab: Allow skipping handling of merges * convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json * convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer * gguf: SpecialVocab: Actually set load_merges in object * Uniform args parsing and vocab only mode for convert examples * convert.py: Set gpt2 as tokenizer model when using BPE * Squish last type warning in gguf.py - yay!

Diffstat (limited to 'llama.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: