summaryrefslogtreecommitdiff
path: root/unicode.h
AgeCommit message (Expand)Author
2024-05-09llama3 custom regex split (#6965)jaime-m-p
2024-05-04tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)Georgi Gerganov
2024-04-29llama : fix BPE pre-tokenization (#6920)Georgi Gerganov
2024-03-26wpm : portable unicode tolower (#6305)Jared Van Bortel
2024-03-11llama : refactor unicode stuff (#5992)Georgi Gerganov
2024-03-01unicode : switch to multimap based nfd_map (#5799)Douglas Hanley
2024-02-28llama : improve BERT tokenization (#5740)Douglas Hanley
2024-02-26unicode : reuse iterator (#5726)Georgi Gerganov
2024-02-13tests : multi-thread the tokenizer tests (#5474)Georgi Gerganov
2024-01-21add `#include <string>` to unicode.h (#5051)bobqianic
2023-10-03Work on the BPE tokenizer (#3252)goerch