summaryrefslogtreecommitdiff
path: root/unicode.cpp
AgeCommit message (Expand)Author
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
2024-06-21llama : optimize long word tokenization with WPM (#8034)Georgi Gerganov
2024-06-18tokenizer : BPE fixes (#7530)jaime-m-p
2024-06-16unicode : avoid char32_t (#7957)Georgi Gerganov
2024-05-18Unicode codepoint flags for custom regexs (#7245)jaime-m-p
2024-05-09llama3 custom regex split (#6965)jaime-m-p
2024-05-04tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)Georgi Gerganov
2024-04-29llama : fix BPE pre-tokenization (#6920)Georgi Gerganov
2024-03-26wpm : portable unicode tolower (#6305)Jared Van Bortel
2024-03-11llama : refactor unicode stuff (#5992)Georgi Gerganov