diff options
author | Douglas Hanley <thesecretaryofwar@gmail.com> | 2024-02-28 02:51:11 -0600 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-02-28 10:51:11 +0200 |
commit | 177628bfd85565070916ad66a5ac4071ee0527d8 (patch) | |
tree | 1532ad96e287a0d8bff4aef92bf2e04eabecec9e /examples/server/server.cpp | |
parent | 6c4416868df2e5455da7d20547f62bcf9735ba8e (diff) |
llama : improve BERT tokenization (#5740)
* implement nfd for stripping accents in wpm tokenizer
* sort nfd map; reuse iterator
* use builtin tolower
* add locale include
* Simplify to_lower cases
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Diffstat (limited to 'examples/server/server.cpp')
0 files changed, 0 insertions, 0 deletions