diff options
author | Galunid <karolek1231456@gmail.com> | 2023-11-20 11:35:47 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-11-20 11:35:47 +0100 |
commit | f23c0359a32871947169a044eb1dc4dbffd0f405 (patch) | |
tree | c427d8031c7d41e1ea23eaeaca956f09b94ee54f /tests/test-tokenizer-0-llama.py | |
parent | 40a34fe8d034bd484efd79ccbb95059ca6308dcb (diff) |
ci : add flake8 to github actions (python linting) (#4129)
Disabled rules:
* E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned
* E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned
* E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned
* E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard
* E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned
* E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned
* E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard
* E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard
* E266 Too many leading '#' for block comment - sometimes used as "section" separator
* E501 Line too long - disabled because it's broken so often it seems like a standard
* E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead)
* E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)
Diffstat (limited to 'tests/test-tokenizer-0-llama.py')
-rw-r--r-- | tests/test-tokenizer-0-llama.py | 52 |
1 files changed, 26 insertions, 26 deletions
diff --git a/tests/test-tokenizer-0-llama.py b/tests/test-tokenizer-0-llama.py index 21df8e6e..f3d4d7e3 100644 --- a/tests/test-tokenizer-0-llama.py +++ b/tests/test-tokenizer-0-llama.py @@ -14,32 +14,32 @@ dir_tokenizer = args.dir_tokenizer tokenizer = SentencePieceProcessor(dir_tokenizer + '/tokenizer.model') tests = [ - "", - " ", - " ", - " ", - "\t", - "\n", - "\t\n", - "Hello world", - " Hello world", - "Hello World", - " Hello World", - " Hello World!", - "Hello, world!", - " Hello, world!", - " this is π¦.cpp", - "w048 7tuijk dsdfhu", - "Π½Π΅ΡΠΎ Π½Π° ΠΡΠ»Π³Π°ΡΡΠΊΠΈ", - "ααΆαααααα·αααα’αΆα
ααα
αα", - "π (normal) πΆβπ«οΈ (multiple emojis concatenated) β
(only emoji that has its own token)", - "Hello", - " Hello", - " Hello", - " Hello", - " Hello", - " Hello\n Hello", - ] + "", + " ", + " ", + " ", + "\t", + "\n", + "\t\n", + "Hello world", + " Hello world", + "Hello World", + " Hello World", + " Hello World!", + "Hello, world!", + " Hello, world!", + " this is π¦.cpp", + "w048 7tuijk dsdfhu", + "Π½Π΅ΡΠΎ Π½Π° ΠΡΠ»Π³Π°ΡΡΠΊΠΈ", + "ααΆαααααα·αααα’αΆα
ααα
αα", + "π (normal) πΆβπ«οΈ (multiple emojis concatenated) β
(only emoji that has its own token)", + "Hello", + " Hello", + " Hello", + " Hello", + " Hello", + " Hello\n Hello", +] for text in tests: |