ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-10-18	metal : implement q5_0 and q5_1 kernels (#3648)	Jhen-Jie Hong
	* metal : implement dequantize_q5_0 * metal : block_q_n_dot_y for block_q5_0 (broken) * metal : revert unnecessary change * metal : implement dequantize_q5_1 * metal : block_q_n_dot_y for q5_1 (broken) * metal : fix block_q_n_dot_y * minor : spaces / formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-18	opencl : fix element-wise multiplication (#3656)	shibe2

2023-10-17	fix embeddings when using CUDA (#3657)	slaren

2023-10-17	llama : avoid fprintf in favor of LLAMA_LOG (#3538)	Georgi Gerganov

2023-10-17	readme : update hot-topics & models, detail windows release in usage (#3615)	BarfingLemurs
	* Update README.md * Update README.md * Update README.md * move "Running on Windows" section below "Prepare data and run" --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17	CLBlast: Fix temporary buffer size for f16 conversion (wsize)	shibe2
	Fix buffer overflow. Reduce the size to fit just one 2D slice. Assert sufficient size.
2023-10-17	train-text-from-scratch : fix assert failure in ggml-alloc (#3618)	slaren

2023-10-17	editorconfig : remove trailing spaces	Georgi Gerganov

2023-10-17	server : documentation of JSON return value of /completion endpoint (#3632)	coezbek
	* Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17	save-load-state : fix example + add ci test (#3655)	Georgi Gerganov
	* save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci
2023-10-17	readme : add Aquila2 links (#3610)	ldwang
	Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>
2023-10-17	tokenizer : special token handling (#3538)	staviq
	* Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17	k-quants : fix quantization ranges (#3646)	Georgi Gerganov

2023-10-16	llava : fix tokenization to not add bos between image embeddings and user ↵	Georgi Gerganov
	prompt (#3645) * llava : fix tokenization to not add bos after system prompt * set seed --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-15	MPT : support GQA for replit-code-v1.5 (#3627)	cebtenzzre

2023-10-14	Honor -ngl option for Cuda offloading in llava (#3621)	M. Yusuf Sarıgöz

2023-10-13	llama : remove n_threads from llama_decode_internal (#3614)	Daniel Bevenius
	This commit removes `n_threads` from the `llama_decode_internal` functions doc comment as it does not exist anymore. It looks like this parameter was removed in Commit 16bc66d9479edd5ee12ec734973554d4493c5dfa ("llama.cpp : split llama_context_params into model and context params"). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-10-13	ggml : add context enumeration functions (#3605)	slaren
	finetune : fix assert failure in ggml-alloc
2023-10-12	CLBlast: Fix matrix-vector multiplication (#3544)	shibe2

2023-10-12	examples: support LLaVA v1.5 (multimodal model) (#3436)	M. Yusuf Sarıgöz
	* WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-12	docs : fix typo GOMP_CPU_AFFINITY (#3597)	uint256_t

2023-10-12	cmake : fix add_compile_options on macOS	Georgi Gerganov

2023-10-12	typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592)	Ian Scrivener
	fixed a typo in the MacOS Metal run doco
2023-10-12	ci : check if there is enough VRAM (#3596)	Georgi Gerganov
	ggml-ci
2023-10-12	server : add completion mode (no chat) (#3582)	Aarni Koskela

2023-10-12	prompts : add mnemonics.txt	Georgi Gerganov

2023-10-12	server : fix kv cache management (#3588)	Georgi Gerganov

2023-10-11	main : fix session loading bug (#3400)	Georgi Gerganov

2023-10-11	server : add parameter -tb N, --threads-batch N (#3584)	Michael Coppola
	Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-11	common : fix mirostat state when using multiple sequences (#3543)	Kerfuffle
	* Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts
2023-10-11	batched : add bench tool (#3545)	Georgi Gerganov
	* batched : add bench tool * batched : minor fix table * batched-bench : add readme + n_kv_max is now configurable * batched-bench : init warm-up batch * batched-bench : pass custom set of PP, TG and PL * batched-bench : add mmq CLI arg
2023-10-11	examples : add batched.swift + improve CI for swift (#3562)	Zane Shannon

2023-10-10	Add MPT model to supported models in README.md (#3574)	Galunid

2023-10-10	Minor improvements in GPT2 tokenizer (#3567)	goerch
	* Fixing minor bugs in bpe_gpt2_preprocess * Don't add bos token in test
2023-10-10	readme : add bloom (#3570)	Xingchen Song(宋星辰)

2023-10-10	llm : add bloom models (#3553)	Xingchen Song(宋星辰)
	* feat: Support bloom models * fix(bloom): fix model size --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10	swift : improvements and fixes (#3564)	Jhen-Jie Hong
	* swift : use macOS 12 as minimum requirement * swift : add missing ggml-backend.c source * swift : add -O3 -DNDEBUG unsafe flags
2023-10-10	llm : add MPT support (#3417)	Jan Ploski
	* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545) * mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt * mpt : protect against "clip_qkv": null in mpt-7b * mpt : quick fix to avoid "Strange model" warning when quantizing MPT models * mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?) * mpt : standardized all tensor names to follow GGUF spec * mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code * mpt : fixed comment s/gptneox/mpt/ * mpt : remove tabs, trailing whitespace * mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt * mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252 * comment out n_past instead of marking it unused * mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"] * mpt : remove unused tokenizer_json in convert script * ggml : remove obsolete n_past assert in ggml_alibi * llama : print clam_kqv and max_alibi_bias hparams --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10	infill. : fix tokenization (#3508)	vvhg1
	* infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check
2023-10-09	ggml-alloc : fix assert in debug builds (#3555)	slaren

2023-10-09	refact : fix convert script + zero out KV cache to avoid nans (#3523)	Georgi Gerganov
	* refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements
2023-10-09	metal : do not use mul_mm kernels when ne00 < 64 (#3542)	Georgi Gerganov

2023-10-08	sync : ggml (ggml-backend) (#3548)	Georgi Gerganov
	* sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build
2023-10-08	ci : add Zig CI/CD and fix build (#2996)	Matheus C. França
	* zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08	api_like_OAI.py : compat with Microsoft Guidance (#2746)	Ryder Wishart
	Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08	api_like_OAI.py : simplify function (#2796)	arcrank
	Simplify function
2023-10-08	k-quants : fix comments about block sizing (#3499)	Johannes Rudolph

2023-10-08	ci : enable on obj-c changes + fix metal build (#3540)	Georgi Gerganov

2023-10-08	zig : fix build by introducing train.cpp (#3539)	Luo Tian

2023-10-08	metal : support MTLGPUFamily < Apple7, formatting, style (#3524)	Georgi Gerganov
	* metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7