diff options
author | Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> | 2023-10-22 12:14:56 -0600 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-22 21:14:56 +0300 |
commit | a5e7dbd6141128bfa3c40a19c2945a181df625d3 (patch) | |
tree | 14cb15291418d4f591d7a58d8239eb02b966b595 /convert-baichuan-hf-to-gguf.py | |
parent | d3956aea53369455008159cc405ed4c496976692 (diff) |
llama : validate special token ids are in range when loading GGUF model (#3635)
* Add validation for special token ids to llama.cpp
Small optimization for llama_byte_to_token SPM mode
* Fix BPE newline check, only I could break something so simple
* Killll meeeeee
* Account for GGUF_KEY_KEY only setting when the key exists
* Minor code cleanups.
* Fix convert.py error msg when added tokens are out of range
* Make gguf SpecialVocab vocab size-aware
Update conversion scripts accordingly
* Avoid a string copy
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'convert-baichuan-hf-to-gguf.py')
-rwxr-xr-x | convert-baichuan-hf-to-gguf.py | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/convert-baichuan-hf-to-gguf.py b/convert-baichuan-hf-to-gguf.py index a1783f71..3b64ecb8 100755 --- a/convert-baichuan-hf-to-gguf.py +++ b/convert-baichuan-hf-to-gguf.py @@ -230,7 +230,7 @@ gguf_writer.add_token_list(tokens) gguf_writer.add_token_scores(scores) gguf_writer.add_token_types(toktypes) -special_vocab = gguf.SpecialVocab(dir_model) +special_vocab = gguf.SpecialVocab(dir_model, n_vocab = len(tokens)) special_vocab.add_to_gguf(gguf_writer) # TENSORS |