ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	wonjun Jang <strutive07@gmail.com>	2023-12-14 17:09:34 +0900
committer	GitHub <noreply@github.com>	2023-12-14 10:09:34 +0200
commit	873637afc7924f435ac44c067630a28e82eefa7b (patch)
tree	82feb6a53b328eca8552304aca5007f26f768cff /ggml.c
parent	0353a1840134b24b07ab61fd4490192f28c4db6b (diff)

convert : support loading vocab from fast tokenizer config (#3633)

* Add HFVocab into convert.py * Update convert.py * Update convert.py * add bytes_to_unicode function * change add_meta_vocab fucntion * remove debug code * remove byte_encoder * Add newline between classes * Check tokenizer.json when tokenizer.model is not exist. * Move transformers dependency to local code * Add error context with 'raise from' * Add fast tokenizer option to BpeVocab * Update convert.py * Add VocabLoader and remove *Vocab class * Add transformers dependency * remove added tokens and check newline token to decide spm or bpe * Update convert.py * Add special token type * Update convert.py * Update convert.py * Update convert.py * Fix typo in convert.py * Fix when params.n_vocab < tokenizer vocab size * update vocab class * change funtion name * Remove unused variable/functions, add types to class variable and methods, delete blank liens * fix flake8 warnings * code style cleanup * make mypy happy * change exception --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>

Diffstat (limited to 'ggml.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: