ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-10-10	readme : add bloom (#3570)	Xingchen Song(宋星辰)

2023-10-10	llm : add bloom models (#3553)	Xingchen Song(宋星辰)
	* feat: Support bloom models * fix(bloom): fix model size --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10	swift : improvements and fixes (#3564)	Jhen-Jie Hong
	* swift : use macOS 12 as minimum requirement * swift : add missing ggml-backend.c source * swift : add -O3 -DNDEBUG unsafe flags
2023-10-10	llm : add MPT support (#3417)	Jan Ploski
	* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545) * mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt * mpt : protect against "clip_qkv": null in mpt-7b * mpt : quick fix to avoid "Strange model" warning when quantizing MPT models * mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?) * mpt : standardized all tensor names to follow GGUF spec * mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code * mpt : fixed comment s/gptneox/mpt/ * mpt : remove tabs, trailing whitespace * mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt * mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252 * comment out n_past instead of marking it unused * mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"] * mpt : remove unused tokenizer_json in convert script * ggml : remove obsolete n_past assert in ggml_alibi * llama : print clam_kqv and max_alibi_bias hparams --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10	infill. : fix tokenization (#3508)	vvhg1
	* infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check
2023-10-09	ggml-alloc : fix assert in debug builds (#3555)	slaren

2023-10-09	refact : fix convert script + zero out KV cache to avoid nans (#3523)	Georgi Gerganov
	* refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements
2023-10-09	metal : do not use mul_mm kernels when ne00 < 64 (#3542)	Georgi Gerganov

2023-10-08	sync : ggml (ggml-backend) (#3548)	Georgi Gerganov
	* sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build
2023-10-08	ci : add Zig CI/CD and fix build (#2996)	Matheus C. França
	* zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08	api_like_OAI.py : compat with Microsoft Guidance (#2746)	Ryder Wishart
	Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08	api_like_OAI.py : simplify function (#2796)	arcrank
	Simplify function
2023-10-08	k-quants : fix comments about block sizing (#3499)	Johannes Rudolph

2023-10-08	ci : enable on obj-c changes + fix metal build (#3540)	Georgi Gerganov

2023-10-08	zig : fix build by introducing train.cpp (#3539)	Luo Tian

2023-10-08	metal : support MTLGPUFamily < Apple7, formatting, style (#3524)	Georgi Gerganov
	* metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
2023-10-08	llama : fix missing break in Persimmon arch case statements (#3535)	Kerfuffle

2023-10-07	Fix trying to strip newline from empty prompt and cfg prompt file content ↵	Kerfuffle
	(#3534)
2023-10-07	gguf.py : fix CI for publishing GGUF package (#3532)	M. Yusuf Sarıgöz
	* Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version
2023-10-07	py : change version of numpy requirement to 1.24.4 (#3515)	Tom C
	Co-authored-by: Lyjia <me@lyjia.us>
2023-10-07	quantize : fail fast on write errors (#3521)	cebtenzzre

2023-10-07	metal : support default.metallib load & reuse code for swift package (#3522)	Jhen-Jie Hong
	* metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
2023-10-07	llm : support Adept Persimmon 8B (#3410)	Phillip Kravtsov
	* Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci
2023-10-07	Fix for #3454 (#3455)	goerch
	Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
2023-10-06	readme : update models, cuda + ppl instructions (#3510)	BarfingLemurs

2023-10-06	server : docs fix default values and add n_probs (#3506)	Mihai

2023-10-06	kv cache slot search improvements (#3493)	Kerfuffle
	* kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06	prompts : fix editorconfig checks after #3416	Georgi Gerganov

2023-10-06	parallel : add option to load external prompt file (#3416)	pudepiedj
	* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06	server : reuse llama_sample_token common util (#3494)	Jhen-Jie Hong
	* server : reuse llama_sample_token common function * common : use n_probs for temperature sampling
2023-10-06	llama : correct hparams comparison (#3446)	l3utterfly
	* fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06	ci : fix xcodebuild destinations (#3491)	Jhen-Jie Hong
	* ci : fix xcodebuild destinations * ci : add .swift to paths
2023-10-05	convert : update Falcon script for new HF config (#3448)	cebtenzzre
	Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05	build : use std::make_tuple() for compatibility with older GCC versions (#3488)	Kenvix ⭐

2023-10-05	common : process escape sequences in reverse prompts (#3461)	staviq

2023-10-05	CLBlast: Fix handling of on-device tensor data	shibe2
	Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.
2023-10-05	server : fix incorrect num_tokens_predicted (#3480)	Jhen-Jie Hong

2023-10-05	swift : disable ACCELERATE_NEW_LAPACK (#3481)	Jhen-Jie Hong

2023-10-05	ci : add swift build via xcodebuild (#3482)	Jhen-Jie Hong

2023-10-04	convert : fix Baichuan2 models by using vocab size in config.json (#3299)	Kerfuffle
	Use local GGUF package when possible in Baichuan converter
2023-10-04	readme : add project status link	Georgi Gerganov

2023-10-04	ggml : fix build after #3329	Georgi Gerganov

2023-10-04	llm : add Refact model (#3329)	ds5t5
	* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)	Georgi Gerganov
	* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf
2023-10-04	finetune : readme fix typo (#3465)	Merrick Christensen
	Fix small typo
2023-10-03	ggml : add RISC-V Vector Support for K-Quants and improved the existing ↵	Tameem
	intrinsics (#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03	main : consistent prefix/suffix coloring (#3425)	h-h-h-h
	* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03	llama : fix session saving/loading (#3400)	Georgi Gerganov
	* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API
2023-10-03	llama : expose model's rope_freq_scale in the API (#3418)	Alex Klinkhamer
	so it can be scaled further before creating a context.
2023-10-03	metal : alibi for arbitrary number of heads (#3426)	Jiahao Li