ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-10-08	ci : enable on obj-c changes + fix metal build (#3540)	Georgi Gerganov

2023-10-08	zig : fix build by introducing train.cpp (#3539)	Luo Tian

2023-10-08	metal : support MTLGPUFamily < Apple7, formatting, style (#3524)	Georgi Gerganov
	* metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
2023-10-08	llama : fix missing break in Persimmon arch case statements (#3535)	Kerfuffle

2023-10-07	Fix trying to strip newline from empty prompt and cfg prompt file content ↵	Kerfuffle
	(#3534)
2023-10-07	gguf.py : fix CI for publishing GGUF package (#3532)	M. Yusuf Sarıgöz
	* Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version
2023-10-07	py : change version of numpy requirement to 1.24.4 (#3515)	Tom C
	Co-authored-by: Lyjia <me@lyjia.us>
2023-10-07	quantize : fail fast on write errors (#3521)	cebtenzzre

2023-10-07	metal : support default.metallib load & reuse code for swift package (#3522)	Jhen-Jie Hong
	* metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
2023-10-07	llm : support Adept Persimmon 8B (#3410)	Phillip Kravtsov
	* Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci
2023-10-07	Fix for #3454 (#3455)	goerch
	Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
2023-10-06	readme : update models, cuda + ppl instructions (#3510)	BarfingLemurs

2023-10-06	server : docs fix default values and add n_probs (#3506)	Mihai

2023-10-06	kv cache slot search improvements (#3493)	Kerfuffle
	* kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06	prompts : fix editorconfig checks after #3416	Georgi Gerganov

2023-10-06	parallel : add option to load external prompt file (#3416)	pudepiedj
	* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06	server : reuse llama_sample_token common util (#3494)	Jhen-Jie Hong
	* server : reuse llama_sample_token common function * common : use n_probs for temperature sampling
2023-10-06	llama : correct hparams comparison (#3446)	l3utterfly
	* fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06	ci : fix xcodebuild destinations (#3491)	Jhen-Jie Hong
	* ci : fix xcodebuild destinations * ci : add .swift to paths
2023-10-05	convert : update Falcon script for new HF config (#3448)	cebtenzzre
	Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05	build : use std::make_tuple() for compatibility with older GCC versions (#3488)	Kenvix ⭐

2023-10-05	common : process escape sequences in reverse prompts (#3461)	staviq

2023-10-05	CLBlast: Fix handling of on-device tensor data	shibe2
	Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.
2023-10-05	server : fix incorrect num_tokens_predicted (#3480)	Jhen-Jie Hong

2023-10-05	swift : disable ACCELERATE_NEW_LAPACK (#3481)	Jhen-Jie Hong

2023-10-05	ci : add swift build via xcodebuild (#3482)	Jhen-Jie Hong

2023-10-04	convert : fix Baichuan2 models by using vocab size in config.json (#3299)	Kerfuffle
	Use local GGUF package when possible in Baichuan converter
2023-10-04	readme : add project status link	Georgi Gerganov

2023-10-04	ggml : fix build after #3329	Georgi Gerganov

2023-10-04	llm : add Refact model (#3329)	ds5t5
	* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)	Georgi Gerganov
	* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf
2023-10-04	finetune : readme fix typo (#3465)	Merrick Christensen
	Fix small typo
2023-10-03	ggml : add RISC-V Vector Support for K-Quants and improved the existing ↵	Tameem
	intrinsics (#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03	main : consistent prefix/suffix coloring (#3425)	h-h-h-h
	* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03	llama : fix session saving/loading (#3400)	Georgi Gerganov
	* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API
2023-10-03	llama : expose model's rope_freq_scale in the API (#3418)	Alex Klinkhamer
	so it can be scaled further before creating a context.
2023-10-03	metal : alibi for arbitrary number of heads (#3426)	Jiahao Li

2023-10-03	cmake : make LLAMA_NATIVE flag actually use the instructions supported by ↵	Eve
	the processor (#3273) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03	Work on the BPE tokenizer (#3252)	goerch
	* Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-02	convert : fix vocab size when not defined in hparams (#3421)	cebtenzzre

2023-10-02	cmake : increase minimum version for add_link_options (#3444)	cebtenzzre

2023-10-02	CLBlast: Add broadcast support for matrix multiplication (#3402)	shibe2
	Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.
2023-10-02	gguf : add BERT, MPT, and GPT-J arch info (#3408)	cebtenzzre

2023-10-02	gguf : general usability improvements (#3409)	cebtenzzre

2023-10-02	cmake : make CUDA flags more similar to the Makefile (#3420)	cebtenzzre
	* cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build
2023-10-02	finetune : fix #3404 (#3437)	xaedes
	the shapes for init model of gqa models was wrong
2023-10-02	metal : set log callback before initializing (#3427)	Adrian

2023-10-02	cmake : fix transient definitions in find pkg (#3411)	bandoti

2023-10-02	docker : ignore Git files (#3314)	Kevin Ji

2023-10-02	infill : add new example + extend server API (#3296)	vvhg1
	* vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>