summaryrefslogtreecommitdiff
path: root/convert-falcon-hf-to-gguf.py
diff options
context:
space:
mode:
authorKerfuffle <44031344+KerfuffleV2@users.noreply.github.com>2023-10-22 12:14:56 -0600
committerGitHub <noreply@github.com>2023-10-22 21:14:56 +0300
commita5e7dbd6141128bfa3c40a19c2945a181df625d3 (patch)
tree14cb15291418d4f591d7a58d8239eb02b966b595 /convert-falcon-hf-to-gguf.py
parentd3956aea53369455008159cc405ed4c496976692 (diff)
llama : validate special token ids are in range when loading GGUF model (#3635)
* Add validation for special token ids to llama.cpp Small optimization for llama_byte_to_token SPM mode * Fix BPE newline check, only I could break something so simple * Killll meeeeee * Account for GGUF_KEY_KEY only setting when the key exists * Minor code cleanups. * Fix convert.py error msg when added tokens are out of range * Make gguf SpecialVocab vocab size-aware Update conversion scripts accordingly * Avoid a string copy Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'convert-falcon-hf-to-gguf.py')
-rwxr-xr-xconvert-falcon-hf-to-gguf.py2
1 files changed, 1 insertions, 1 deletions
diff --git a/convert-falcon-hf-to-gguf.py b/convert-falcon-hf-to-gguf.py
index 1d98c51a..8e8f3c3f 100755
--- a/convert-falcon-hf-to-gguf.py
+++ b/convert-falcon-hf-to-gguf.py
@@ -152,7 +152,7 @@ gguf_writer.add_token_list(tokens)
gguf_writer.add_token_scores(scores)
gguf_writer.add_token_types(toktypes)
-special_vocab = gguf.SpecialVocab(dir_model, load_merges = True)
+special_vocab = gguf.SpecialVocab(dir_model, load_merges = True, n_vocab = len(tokens))
special_vocab.add_to_gguf(gguf_writer)
# TENSORS