Age | Commit message (Expand) | Author |
---|---|---|
2024-05-09 | llama3 custom regex split (#6965) | jaime-m-p |
2024-05-04 | tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) | Georgi Gerganov |
2024-04-29 | llama : fix BPE pre-tokenization (#6920) | Georgi Gerganov |
2024-03-26 | wpm : portable unicode tolower (#6305) | Jared Van Bortel |
2024-03-11 | llama : refactor unicode stuff (#5992) | Georgi Gerganov |
2024-03-01 | unicode : switch to multimap based nfd_map (#5799) | Douglas Hanley |
2024-02-28 | llama : improve BERT tokenization (#5740) | Douglas Hanley |
2024-02-26 | unicode : reuse iterator (#5726) | Georgi Gerganov |
2024-02-13 | tests : multi-thread the tokenizer tests (#5474) | Georgi Gerganov |
2024-01-21 | add `#include <string>` to unicode.h (#5051) | bobqianic |
2023-10-03 | Work on the BPE tokenizer (#3252) | goerch |