summaryrefslogtreecommitdiff
path: root/ggml.c
diff options
context:
space:
mode:
authorjaime-m-p <167997752+jaime-m-p@users.noreply.github.com>2024-05-28 21:46:34 +0200
committerGitHub <noreply@github.com>2024-05-28 21:46:34 +0200
commit02c1ecad07f0e2d2febe8196271bcc64bdc9c006 (patch)
tree2208298e9ac6bd0743787d02f35b527f7db47d0b /ggml.c
parent6bd12ce409f949012935b7d1b15a21ffa473a565 (diff)
Tokenizer WPM fixes (#7500)
* Update random test: add_bos_token. * Update random test: add WPM models for testing. * Build vocab.special_tokens_cache using vocab token types. * Fix and improve WPM preprocessing. - Fix unicode edge case combinations. - Split by whitspace in the same pass. * Discard all tokens when no matching found.
Diffstat (limited to 'ggml.c')
0 files changed, 0 insertions, 0 deletions