ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-05-23	Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX bas...	fairydreaming
2024-05-23	llama : rename n_ctx -> cache.size, less confusing (#0)	Georgi Gerganov
2024-05-23	labeler.yml: add embedding label detector [no ci] (#7482)	Brian
2024-05-23	ggml : remove ggml_flash_attn and ggml_flash_ff (#7463)	Georgi Gerganov
2024-05-23	ggml : drop support for QK_K=64 (#7473)	Georgi Gerganov
2024-05-23	Update vulkan rope implementation to support frequency factors (#7475)	0cc4m
2024-05-23	main : minor (#7462)	Georgi Gerganov
2024-05-23	CUDA: fix FA out-of-bounds reads (#7479)	Johannes Gäßler
2024-05-23	SimpleChat: a simple and dumb web front end for testing /chat/completions and...	HanishKVC
2024-05-22	build : remove zig (#7471)	Georgi Gerganov
2024-05-22	common : normalize naming style (#7462)	Georgi Gerganov
2024-05-22	CUDA: fix FA out-of-bounds writes (#7465)	Johannes Gäßler
2024-05-22	phi3 : duplicate rope factors in each layer (#7447)	slaren
2024-05-22	vulkan: add workaround for iterator boundary check to fix clang-cl debug buil...	k.h.lai
2024-05-22	llama : add missing model type names (#7445)	Justine Tunney
2024-05-22	cuda : fix compile warning (#7454)	Georgi Gerganov
2024-05-22	CUDA: remove incorrect precision check (#7454)	Johannes Gäßler
2024-05-22	cuda : fix rope + add tests (#7452)	Georgi Gerganov
2024-05-21	llama : add phi3 128K model support (#7225)	liuwei-git
2024-05-21	metal : handle F16 inf values, fix FA partial offload (#7434)	Georgi Gerganov
2024-05-21	`grammars`: fix resampling logic regression (#7424)	Olivier Chafik
2024-05-21	CUDA: fix unused warning in mmq.cu (#7442)	Johannes Gäßler
2024-05-21	tests : test-tokenizer-0.sh print more info (#7402)	Georgi Gerganov
2024-05-21	examples: cache hf model when --model not provided (#7353)	Amir
2024-05-21	CUDA: deduplicate mmq code (#7397)	Johannes Gäßler
2024-05-21	Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)	jaime-m-p
2024-05-20	Tokenizer SPM fixes for phi-3 and llama-spm (#7375)	jaime-m-p
2024-05-21	llama : remove Persimmon (#7408)	Georgi Gerganov
2024-05-20	perplexity: update README FP16 results [no ci] (#7413)	Johannes Gäßler
2024-05-20	rpc : track allocated buffers (#7411)	Radoslav Gerganov
2024-05-20	server : fix temperature + disable some tests (#7409)	Georgi Gerganov
2024-05-20	[SYCL] Update SYCL upscale operation (#7321)	AidanBeltonS
2024-05-20	Update README.md (#7410)	Bingan
2024-05-20	ggml-opencl, llama: using reserve() if count already known (#7272)	Herman Semenov
2024-05-20	ggml : add loongarch lsx and lasx support (#6454)	junchao-loongson
2024-05-20	server : tuning tests (#7388)	Georgi Gerganov
2024-05-20	server : return error on too large embedding input (#7389)	Georgi Gerganov
2024-05-20	tests : fix --keep_split -> --keep-split (#7374)	Georgi Gerganov
2024-05-20	Add provisions for windows support for BF16 code including CMake provision fo...	Srihari-mcw
2024-05-20	llama : remove MPI backend (#7395)	slaren
2024-05-19	quantize : fix --keep-split check (#7374)	Fred Douglas
2024-05-19	Vulkan Embedding Fix (#7360)	0cc4m
2024-05-19	ggml : fix another case of quants nans (#7387)	slaren
2024-05-19	ggml: implement quantized KV cache for FA (#7372)	Johannes Gäßler
2024-05-19	server: add test for token probs (#7347)	Johannes Gäßler
2024-05-19	server: fix seed being reported back (#7382)	Johannes Gäßler
2024-05-19	Add StableLM2 pre-tokenizer (#7349)	Anas Ahouzi
2024-05-19	cuda : clear error after buffer allocation failure (#7376)	slaren
2024-05-19	labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)	Brian
2024-05-19	cmake : update android comments (#7341)	Georgi Gerganov