Age | Commit message (Collapse) | Author |
|
* convert.py: add python logging instead of print()
* convert.py: verbose flag takes priority over dump flag log suppression
* convert.py: named instance logging
* convert.py: use explicit logger id string
* convert.py: convert extra print() to named logger
* convert.py: sys.stderr.write --> logger.error
* *.py: Convert all python scripts to use logging module
* requirements.txt: remove extra line
* flake8: update flake8 ignore and exclude to match ci settings
* gh-actions: add flake8-no-print to flake8 lint step
* pre-commit: add flake8-no-print to flake8 and also update pre-commit version
* convert-hf-to-gguf.py: print() to logger conversion
* *.py: logging basiconfig refactor to use conditional expression
* *.py: removed commented out logging
* fixup! *.py: logging basiconfig refactor to use conditional expression
* constant.py: logger.error then exit should be a raise exception instead
* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)
* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar
* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.
* compare-llama-bench.py: add blank line for readability during missing repo response
* reader.py: read_gguf_file() use print() over logging
* convert.py: warning goes to stderr and won't hurt the dump output
* gguf-dump.py: dump_metadata() should print to stdout
* convert-hf-to-gguf.py: print --> logger.debug or ValueError()
* verify-checksum-models.py: use print() for printing table
* *.py: refactor logging.basicConfig()
* gguf-py/gguf/*.py: use __name__ as logger name
Since they will be imported and not run directly.
* python-lint.yml: use .flake8 file instead
* constants.py: logger no longer required
* convert-hf-to-gguf.py: add additional logging
* convert-hf-to-gguf.py: print() --> logger
* *.py: fix flake8 warnings
* revert changes to convert-hf-to-gguf.py for get_name()
* convert-hf-to-gguf-update.py: use triple quoted f-string instead
* *.py: accidentally corrected the wrong line
* *.py: add compilade warning suggestions and style fixes
|
|
* merged the changes from deepseeker models to main branch
* Moved regex patterns to unicode.cpp and updated unicode.h
* Moved header files
* Resolved issues
* added and refactored unicode_regex_split and related functions
* Updated/merged the deepseek coder pr
* Refactored code
* Adding unicode regex mappings
* Adding unicode regex function
* Added needed functionality, testing remains
* Fixed issues
* Fixed issue with gpt2 regex custom preprocessor
* unicode : fix? unicode_wstring_to_utf8
* lint : fix whitespaces
* tests : add tokenizer tests for numbers
* unicode : remove redundant headers
* tests : remove and rename tokenizer test scripts
* tests : add sample usage
* gguf-py : reader prints warnings on duplicate keys
* llama : towards llama3 tokenization support (wip)
* unicode : shot in the dark to fix tests on Windows
* unicode : first try custom implementations
* convert : add "tokenizer.ggml.pre" GGUF KV (wip)
* llama : use new pre-tokenizer type
* convert : fix pre-tokenizer type writing
* lint : fix
* make : add test-tokenizer-0-llama-v3
* wip
* models : add llama v3 vocab file
* llama : adapt punctuation regex + add llama 3 regex
* minor
* unicode : set bomb
* unicode : set bomb
* unicode : always use std::wregex
* unicode : support \p{N}, \p{L} and \p{P} natively
* unicode : try fix windows
* unicode : category support via std::regex
* unicode : clean-up
* unicode : simplify
* convert : add convert-hf-to-gguf-update.py
ggml-ci
* lint : update
* convert : add falcon
ggml-ci
* unicode : normalize signatures
* lint : fix
* lint : fix
* convert : remove unused functions
* convert : add comments
* convert : exercise contractions
ggml-ci
* lint : fix
* cmake : refactor test targets
* tests : refactor vocab tests
ggml-ci
* tests : add more vocabs and tests
ggml-ci
* unicode : cleanup
* scripts : ignore new update script in check-requirements.sh
* models : add phi-3, mpt, gpt-2, starcoder
* tests : disable obsolete
ggml-ci
* tests : use faster bpe test
ggml-ci
* llama : more prominent warning for old BPE models
* tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
---------
Co-authored-by: Jaggzh <jaggz.h@gmail.com>
Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
|
|
exist (#5821)
|
|
|
|
Disabled rules:
* E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned
* E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned
* E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned
* E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard
* E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned
* E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned
* E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard
* E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard
* E266 Too many leading '#' for block comment - sometimes used as "section" separator
* E501 Line too long - disabled because it's broken so often it seems like a standard
* E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead)
* E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)
|
|
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
Co-authored-by: Jiri Podivin <jpodivin@redhat.com>
|
|
* gguf-py: Refactor and add file reading support
* Replay changes from #3871
Credit to @cebtenzzre for that pull
* Various type annotation fixes.
* sort imports with isort (again)
* Fix missing return statement in add_tensor
* style cleanup with flake8
* fix NamedTuple and Enum usage
* Fix an issue with state init in GGUFReader
Move examples to an examples/ directory
Clean up examples
Add an example of modifying keys in a GGUF file
Update documentation with info on examples
Try to support people importing gguf/gguf.py directly
* Damagage is not a word.
* Clean up gguf-py/examples/modify_gguf.py whitespace
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* Update gguf-py/examples/modify_gguf.py formatting
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* Update gguf-py/gguf/gguf_reader.py type hint
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* Make examples executable, formatting changes
* Add more information to GGUFReader and examples comments
* Include a gguf Python package version bump
* Add convert-gguf-endian.py script
* cleanup
* gguf-py : bump minor version
* Reorganize scripts
* Make GGUFReader endian detection less arbitrary
* Add JSON dumping support to gguf-dump.py
Which I kind of regret now
* A few for gguf-dump.py cleanups
* Murder accidental tuple in gguf-py/scripts/gguf-dump.py
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* cleanup
* constants : remove unneeded type annotations
* fix python 3.8 compat
* Set up gguf- scripts in pyproject.toml
* And include scripts/__init__.py, derp
* convert.py: We can't currently support Q8_0 on big endian.
* gguf-py: SpecialVocab: Always try available sources for special token ids
gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json
gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata
u
* cleanup
* Promote add_X_token to GGUF metadata for BOS and EOS
---------
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
|
|
* Add validation for special token ids to llama.cpp
Small optimization for llama_byte_to_token SPM mode
* Fix BPE newline check, only I could break something so simple
* Killll meeeeee
* Account for GGUF_KEY_KEY only setting when the key exists
* Minor code cleanups.
* Fix convert.py error msg when added tokens are out of range
* Make gguf SpecialVocab vocab size-aware
Update conversion scripts accordingly
* Avoid a string copy
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* convert-llama-ggmlv3-to-gguf: Try to handle files older than GGJTv3
* Better error messages for files that cannot be converted
* Add file type to GGUF output
* Rename to convert-llama-ggml-to-gguf.py
* Include original file type information in description
* Improve some informational output
|