ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-07-27	Merge mainline llama.cpp (#3)	Kawrakow
	* Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-06-18	tokenizer : BPE fixes (#7530)	jaime-m-p
	* Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t
2024-06-18	ggml : sync	Georgi Gerganov

2024-06-13	`build`: rename main → llama-cli, server → llama-server, llava-cli → ↵	Olivier Chafik
	llama-llava-cli, etc... (#7809) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-04	common : refactor cli arg parsing (#7675)	Georgi Gerganov
	* common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params
2024-06-04	ggml : remove OpenCL (#7735)	Georgi Gerganov
	ggml-ci
2024-06-04	llama-bench : allow using a different printer for stderr with -oe (#7722)	slaren
	compare-commits.sh : hide stdout, use -oe to print markdown
2024-05-31	scripts: update compare_llama_bench.py [no ci] (#7673)	Johannes Gäßler

2024-05-30	Move convert.py to examples/convert-legacy-llama.py (#7430)	Galunid
	* Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes
2024-05-29	scripts : remove mpi remnants	Georgi Gerganov

2024-05-29	sync : ggml	Georgi Gerganov

2024-05-20	llama : remove MPI backend (#7395)	slaren

2024-05-18	Unicode codepoint flags for custom regexs (#7245)	jaime-m-p
	* Replace CODEPOINT_TYPE_* with codepoint_flags * Update and bugfix brute force random test * Deterministic brute force random test * Unicode normalization NFD * Get rid of BOM
2024-05-17	Added a single test function script and fix debug-test.sh to be more robust ↵	Brian
	(#7279) * run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust * debug-test.sh: combined execute and gdb test mode via -g flag * debug-test.sh: refactor * debug-test: refactor for clarity * debug-test.sh: comment style changes * debug-test.sh: fix gdb
2024-05-15	sync : ggml	Georgi Gerganov

2024-05-14	script : sync ggml-rpc	Georgi Gerganov

2024-05-14	sync : ggml	Georgi Gerganov
	ggml-ci
2024-05-11	metal : fix warnings (skipme) (#0)	Georgi Gerganov

2024-05-11	sync : ggml	Georgi Gerganov

2024-05-12	Scripting & documenting debugging one test without anything else in the ↵	Josh Ramer
	loop. (#7096) * A little documentation that shares my quick tips for working in the repository. * Update startup-testing-debugging.md * script that shows a menu of tests to pick from & run the debugger on * debug-test.sh: Refactor CLI help message * debug-test.sh: documentation update * debug-test.sh: CLI Help output corrections * debug-test.sh: minor doc fix --------- authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal> Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-11	sync : ggml	Georgi Gerganov
	ggml-ci
2024-05-10	llama-bench : add pp+tg test type (#7199)	slaren

2024-05-09	llama3 custom regex split (#6965)	jaime-m-p
	* merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * llama3 custom regex split * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * Using char32_t for codepoints * lint : fix * already exists unicode_tolower() * Typing * Restore BOM * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * Fix merge * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci * Move unused variable value * GPT2 custom regex split * Add alternative regex for custom aplit llama3 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Style * Add bruteforce random tests for token encoding * wip: fixing unicode codepoint ranges * Fix merge * Unicode tables: separator, lowercase, uppercase and whitespace * llama3 custom regex split: fix \s * Restore BOM * Style * wip: generate NDF table * Ignore special tokens for testing * Clean gen-unicode-data.py * Refactor random tokenizer test * lint : fix * tests : add fail test for llama-bpe --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: jaime-m-p <>
2024-05-08	compare-llama-bench.py: add missing basicConfig (#7138)	Brian
	* compare-llama-bench.py: add missing basicConfig * compare-llama-bench.py: Add line break between error message and print_help() * Add regular print() markdown table
2024-05-05	py : logging and flake8 suppression refactoring (#7081)	Brian
	Set one as executable and add basicConfig() to another. Also added noqa tag to test scripts.
2024-05-04	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)	Georgi Gerganov
	* tests : add test-tokenizer-0.sh * unicode : add all unicode number ranges * starcoder : fix pre-tokenizer * tests : add test that fails with DeepSeek tokenizers * falcon : fix regex * unicode : regenerate unicode tables * refact : add tokenizer model * lint : fix * tests : disable failing tests ggml-ci * refact : add tests files ggml-ci * convert : print -> logging ggml-ci * lint : fix * unicode : digit -> number * phi-3 : update
2024-05-03	convert.py : add python logging instead of print() (#6511)	Brian
	* convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes
2024-04-29	llama : fix BPE pre-tokenization (#6920)	Georgi Gerganov
	* merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
2024-04-21	`build`: generate hex dump of server assets during build (#6661)	Olivier Chafik
	* `build`: generate hex dumps of server assets on the fly * build: workaround lack of -n on gnu xxd * build: don't use xxd in cmake * build: don't call xxd from build.zig * build: more idiomatic hexing * build: don't use xxd in Makefile (od hackery instead) * build: avoid exceeding max cmd line limit in makefile hex dump * build: hex dump assets at cmake build time (not config time)
2024-04-18	ggml : group all experts in a single ggml_mul_mat_id (#6505)	slaren
	* ggml : group all experts in a single ggml_mul_mat_id cuda : improve mmid row copy * cuda : fix bin bcast with non-cont src0 * test-backend-ops : only run all mul mat tests for base types * llama : disable moe offloading with SYCL --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-13	model: support arch `DbrxForCausalLM` (#6515)	Pierrick Hymbert
	* model: dbrx convert to gguf #6344 * llama: support dbrx #6344 * doc: dbrx: add the model as supported * scripts: get-wikitext-2 add unzip * llama: increase maximum experts allowed * llama: factorize moe graph implementation between grok, mixtral and dbrx --------- Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-11	scripts : add --outdir option to hf.sh (#6600)	Daniel Bevenius
	* scripts : add --outdir option to hf.sh This commit adds an option to the hf.sh script that allows the user to specify an output directory for the downloaded file. The motivation for this changes is that examples that use the hf.sh script to download models from huggingface can now specify the output directory, perhaps to the `models` directory to keep them in one place and not clutter the root directory. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! scripts : add --outdir option to hf.sh Fix format of the --outdir option in the usage message. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-09	sync : ggml	Georgi Gerganov

2024-04-09	license : update copyright notice + add AUTHORS (#6405)	Georgi Gerganov
	* license : add AUTHORS * authors : update * scipts : add LICENSE and gen-authors.sh to sync
2024-04-07	sync : ggml	Georgi Gerganov

2024-04-07	scripts : sync ggml-cuda folder	Georgi Gerganov

2024-04-06	sync : ggml	Georgi Gerganov

2024-04-01	compare-llama-bench.py: fix long hexsha args (#6424)	Johannes Gäßler

2024-03-29	sync : ggml (#6351)	Georgi Gerganov
	* sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-03-26	cuda : rename build flag to LLAMA_CUDA (#6299)	slaren

2024-03-23	lookup: complement data from context with general text statistics (#5479)	Johannes Gäßler
	* lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens
2024-03-10	sync : ggml	Georgi Gerganov

2024-03-09	ggml : add ggml-common.h to deduplicate shared code (#5940)	Georgi Gerganov
	* ggml : add ggml-common.h to shared code ggml-ci * scripts : update sync scripts * sycl : reuse quantum tables ggml-ci * ggml : minor * ggml : minor * sycl : try to fix build
2024-03-05	compare-llama-bench.py : remove mul_mat_q (#5892)	slaren

2024-03-04	sync : ggml	Georgi Gerganov
	ggml-ci
2024-03-04	sync : ggml	Georgi Gerganov

2024-03-02	scripts : add pod-llama.sh	Georgi Gerganov

2024-03-01	llama : cleanup unused mmq flags (#5772)	Pierrick Hymbert
	* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q * remove: mul_mat_q in compare llama bench and usage * update llama-bench --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-02-28	sync : ggml	Georgi Gerganov

2024-02-22	sync : ggml	Georgi Gerganov