ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-05-22	build : remove zig (#7471)	Georgi Gerganov

2024-05-22	common : normalize naming style (#7462)	Georgi Gerganov
	* common : normalize naming style ggml-ci * common : match declaration / definition order * zig : try to fix build
2024-05-22	CUDA: fix FA out-of-bounds writes (#7465)	Johannes Gäßler

2024-05-22	phi3 : duplicate rope factors in each layer (#7447)	slaren
	* phi3 : duplicate rope factors in each layer phi3 : set phi-3 model type as 14B model loader : simplify the process for duplicating model tensors llama-bench : remove default pg test * replace bool parameters in llama_model_loader with named flags
2024-05-22	vulkan: add workaround for iterator boundary check to fix clang-cl debug ↵	k.h.lai
	build (#7426)
2024-05-22	llama : add missing model type names (#7445)	Justine Tunney

2024-05-22	cuda : fix compile warning (#7454)	Georgi Gerganov

2024-05-22	CUDA: remove incorrect precision check (#7454)	Johannes Gäßler

2024-05-22	cuda : fix rope + add tests (#7452)	Georgi Gerganov
	* cuda : fix rope pos data ggml-ci * ggml : drop mode & 1 == 1 support for ggml_rope ggml-ci * ggml : support freq_factors for f16 rope (CPU) ggml-ci * tests : add rope tests using frequency factors ggml-ci
2024-05-21	llama : add phi3 128K model support (#7225)	liuwei-git
	* add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-21	metal : handle F16 inf values, fix FA partial offload (#7434)	Georgi Gerganov
	ggml-ci
2024-05-21	`grammars`: fix resampling logic regression (#7424)	Olivier Chafik

2024-05-21	CUDA: fix unused warning in mmq.cu (#7442)	Johannes Gäßler

2024-05-21	tests : test-tokenizer-0.sh print more info (#7402)	Georgi Gerganov

2024-05-21	examples: cache hf model when --model not provided (#7353)	Amir
	* examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided
2024-05-21	CUDA: deduplicate mmq code (#7397)	Johannes Gäßler

2024-05-21	Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)	jaime-m-p
	* Update brute force test: add_special * Update brute force test: default values for add_bos_token and add_eos_token * Enable rtrim when pre-inserting BOS Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "server : fix test regexes"
2024-05-20	Tokenizer SPM fixes for phi-3 and llama-spm (#7375)	jaime-m-p
	* Update brute force test: special tokens * Fix added tokens - Try to read 'added_tokens.json'. - Try to read 'tokenizer_config.json'. - Try to read 'tokenizer.json'. * Fix special tokens rtrim Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix test regexes
2024-05-21	llama : remove Persimmon (#7408)	Georgi Gerganov
	* llama : remove Persimmon * requirements : remove
2024-05-20	perplexity: update README FP16 results [no ci] (#7413)	Johannes Gäßler

2024-05-20	rpc : track allocated buffers (#7411)	Radoslav Gerganov
	* rpc : track allocated buffers ref: #7407 * rpc : pack rpc_tensor tightly
2024-05-20	server : fix temperature + disable some tests (#7409)	Georgi Gerganov
	* server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo
2024-05-20	[SYCL] Update SYCL upscale operation (#7321)	AidanBeltonS
	* Update SYCL upscale operation * Formatting * Remove messages
2024-05-20	Update README.md (#7410)	Bingan

2024-05-20	ggml-opencl, llama: using reserve() if count already known (#7272)	Herman Semenov

2024-05-20	ggml : add loongarch lsx and lasx support (#6454)	junchao-loongson
	* add loongarch lsx and lasx optimize code * Add loongarch compilation support to makefile * revert stb_image.h * opt bytes_from_nibbles_32 and sum_i16_pairs_float * fix undeclared * format code * update * update 2 --------- Co-authored-by: Jinyang He <hejinyang@loongson.cn>
2024-05-20	server : tuning tests (#7388)	Georgi Gerganov
	* server : don't pass temperature as string * server : increase timeout * tests : fix the fix 0.8f -> 0.8 ggml-ci * tests : set explicit temperature
2024-05-20	server : return error on too large embedding input (#7389)	Georgi Gerganov

2024-05-20	tests : fix --keep_split -> --keep-split (#7374)	Georgi Gerganov

2024-05-20	Add provisions for windows support for BF16 code including CMake provision ↵	Srihari-mcw
	for enabling AVX512_BF16 (#7258)
2024-05-20	llama : remove MPI backend (#7395)	slaren

2024-05-19	quantize : fix --keep-split check (#7374)	Fred Douglas

2024-05-19	Vulkan Embedding Fix (#7360)	0cc4m
	* Fix empty Vulkan host buffers Add fp32 fp16 matmul shader Fix matmul shader alignment * Remove deprecated tensor->backend uses * Fix Vulkan validation errors on embedding models with no offloaded layers * Fix Vulkan llava segfault when not offloading layers
2024-05-19	ggml : fix another case of quants nans (#7387)	slaren

2024-05-19	ggml: implement quantized KV cache for FA (#7372)	Johannes Gäßler

2024-05-19	server: add test for token probs (#7347)	Johannes Gäßler

2024-05-19	server: fix seed being reported back (#7382)	Johannes Gäßler

2024-05-19	Add StableLM2 pre-tokenizer (#7349)	Anas Ahouzi
	* Add StableLM pre-tokenizer * Fix space * Fix trailing whitespace
2024-05-19	cuda : clear error after buffer allocation failure (#7376)	slaren

2024-05-19	labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)	Brian
	https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action Recommends the use of checkout action to use the correct repo context when applying settings for PR labels e.g. steps: - uses: actions/checkout@v4 # Uploads repository content to the runner with: repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more - uses: actions/labeler@v5 with: configuration-path: 'path/to/the/uploaded/configuration/file'
2024-05-19	cmake : update android comments (#7341)	Georgi Gerganov

2024-05-19	Capture CUDA logging output (#7298)	fraxy-v
	* logging: output capture in cuda module * fix compile error * fix: vsnprintf terminates with 0, string use not correct * post review * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-05-18	ci : re-enable sanitizer runs (#7358)	Georgi Gerganov
	* Revert "ci : temporary disable sanitizer builds (#6128)" This reverts commit 4f6d1337ca5a409dc74aca8c479b7c34408a69c0. * ci : trigger
2024-05-18	android : use "ci-android" branch for CI (#7341)	Georgi Gerganov
	* android : use "ci-android" branch for CI * ggml : disable SIMD exp and silu for 32-bit ARM ggml-ci * android : do not fetch, use add_subdirectory instead * cmake : provide binary dir
2024-05-18	CUDA: deduplicate FlashAttention code (#7352)	Johannes Gäßler

2024-05-18	server: correct --threads documentation [no ci] (#7362)	Johannes Gäßler

2024-05-18	cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)	Engininja2

2024-05-18	llama : add support for larger Granite Code Models (20B, 34B) (#7324)	Steffen Röcker
	Tie the weights for ARCH_STARCODER to support the larger Granite code models. Partially addresses ggerganov/issues/7116 There still remains to be a few things to fix. Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
2024-05-18	perplexity : ndot progress and show stats with < 100 tasks (#7348)	strawberrymelonpanda
	Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.
2024-05-18	Update and fix Vulkan soft_max and argsort implementations (#7237)	0cc4m
	* Update and fix Vulkan softmax implementation * Update and fix Vulkan argsort implementation