summaryrefslogtreecommitdiff
path: root/tests/test-tokenizer-0-llama.py
diff options
context:
space:
mode:
authorGalunid <karolek1231456@gmail.com>2023-11-20 11:35:47 +0100
committerGitHub <noreply@github.com>2023-11-20 11:35:47 +0100
commitf23c0359a32871947169a044eb1dc4dbffd0f405 (patch)
treec427d8031c7d41e1ea23eaeaca956f09b94ee54f /tests/test-tokenizer-0-llama.py
parent40a34fe8d034bd484efd79ccbb95059ca6308dcb (diff)
ci : add flake8 to github actions (python linting) (#4129)
Disabled rules: * E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned * E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned * E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned * E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard * E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned * E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned * E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard * E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard * E266 Too many leading '#' for block comment - sometimes used as "section" separator * E501 Line too long - disabled because it's broken so often it seems like a standard * E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead) * E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)
Diffstat (limited to 'tests/test-tokenizer-0-llama.py')
-rw-r--r--tests/test-tokenizer-0-llama.py52
1 files changed, 26 insertions, 26 deletions
diff --git a/tests/test-tokenizer-0-llama.py b/tests/test-tokenizer-0-llama.py
index 21df8e6e..f3d4d7e3 100644
--- a/tests/test-tokenizer-0-llama.py
+++ b/tests/test-tokenizer-0-llama.py
@@ -14,32 +14,32 @@ dir_tokenizer = args.dir_tokenizer
tokenizer = SentencePieceProcessor(dir_tokenizer + '/tokenizer.model')
tests = [
- "",
- " ",
- " ",
- " ",
- "\t",
- "\n",
- "\t\n",
- "Hello world",
- " Hello world",
- "Hello World",
- " Hello World",
- " Hello World!",
- "Hello, world!",
- " Hello, world!",
- " this is πŸ¦™.cpp",
- "w048 7tuijk dsdfhu",
- "Π½Π΅Ρ‰ΠΎ Π½Π° Π‘ΡŠΠ»Π³Π°Ρ€ΡΠΊΠΈ",
- "αž€αžΆαž“αŸ‹αžαŸ‚αž–αž·αžŸαŸαžŸαž’αžΆαž…αžαž›αž…αŸαž‰",
- "πŸš€ (normal) πŸ˜Άβ€πŸŒ«οΈ (multiple emojis concatenated) βœ… (only emoji that has its own token)",
- "Hello",
- " Hello",
- " Hello",
- " Hello",
- " Hello",
- " Hello\n Hello",
- ]
+ "",
+ " ",
+ " ",
+ " ",
+ "\t",
+ "\n",
+ "\t\n",
+ "Hello world",
+ " Hello world",
+ "Hello World",
+ " Hello World",
+ " Hello World!",
+ "Hello, world!",
+ " Hello, world!",
+ " this is πŸ¦™.cpp",
+ "w048 7tuijk dsdfhu",
+ "Π½Π΅Ρ‰ΠΎ Π½Π° Π‘ΡŠΠ»Π³Π°Ρ€ΡΠΊΠΈ",
+ "αž€αžΆαž“αŸ‹αžαŸ‚αž–αž·αžŸαŸαžŸαž’αžΆαž…αžαž›αž…αŸαž‰",
+ "πŸš€ (normal) πŸ˜Άβ€πŸŒ«οΈ (multiple emojis concatenated) βœ… (only emoji that has its own token)",
+ "Hello",
+ " Hello",
+ " Hello",
+ " Hello",
+ " Hello",
+ " Hello\n Hello",
+]
for text in tests: