ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-05-31	scripts: update compare_llama_bench.py [no ci] (#7673)	Johannes Gäßler

2024-05-31	Improve HIP compatibility (#7672)	Daniele

2024-05-31	readme : link homebrew discussion	Georgi Gerganov

2024-05-31	ggml : fix loongson compile warnings (#7537)	Georgi Gerganov
	* ggml : fix loongson compile warnings ggml-ci * Fix loongarch quantize test fail. Fix unexpected error introduced during rebase code. * tests : disable json test due to lack of python on the CI node ggml-ci --------- Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-05-31	Somehow '**' got lost (#7663)	Galunid

2024-05-31	Add convert.py removal to hot topics (#7662)	Galunid

2024-05-31	[no ci] docs: add aikit to readme (#7650)	Sertaç Özercan
	Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-30	Fixed painfully slow single process builds. (#7326)	JohnnyB
	* Fixed painfully slow single process builds. * Added nproc for systems that don't default to nproc
2024-05-31	llama : cache llama_token_to_piece (#7587)	Georgi Gerganov
	* llama : cache llama_token_to_piece ggml-ci * llama : use vectors and avoid has_cache ggml-ci * llama : throw on unknown tokenizer types ggml-ci * llama : print a log of the total cache size
2024-05-31	Fix conan badge display [no ci] (#7645)	Martin Delille

2024-05-31	Add brew installation instruction to README [no ci] (#7616)	Manuel

2024-05-30	readme : add Conan badge (#7638)	Martin Delille

2024-05-30	github: add contact links to issues and convert question into research [no ↵	Brian
	ci] (#7612)
2024-05-30	Move convert.py to examples/convert-legacy-llama.py (#7430)	Galunid
	* Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes
2024-05-30	faster avx512 exp implementation (#7551)	Chris Elrod
	* faster avx512 exp implementation * x->r * improve accuracy, handle special cases * remove `e`
2024-05-30	ggml : fix loongarch build (O2 issue) (#7636)	junchao-loongson

2024-05-30	README: explain parallel build [no ci] (#7618)	Johannes Gäßler

2024-05-30	[SYCL] fix intel docker (#7630)	Meng, Hengyu
	* Update main-intel.Dockerfile * workaround for https://github.com/intel/oneapi-containers/issues/70 * reset intel docker in CI * add missed in server
2024-05-30	gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627)	Galunid

2024-05-29	metal : remove invalid asserts (#7617)	Georgi Gerganov

2024-05-29	metal : add missing asserts (#7617)	Georgi Gerganov

2024-05-29	ggml : fix YARN + add tests + add asserts (#7617)	Georgi Gerganov
	* tests : add rope tests ggml-ci * ggml : fixes (hopefully) ggml-ci * tests : add non-cont tests ggml-ci * cuda : add asserts for rope/norm + fix DS2 ggml-ci * ggml : assert contiguousness * tests : reduce RoPE tests ggml-ci
2024-05-29	cuda : non-cont concat support (#7610)	Georgi Gerganov
	* tests : add non-cont concat tests * cuda : non-cont concat support ggml-ci
2024-05-29	llama-bench : add support for the RPC backend (#7435)	Radoslav Gerganov

2024-05-29	ggml : use atomic_flag for critical section (#7598)	slaren
	* ggml : use atomic_flag for critical section * add windows shims
2024-05-29	scripts : remove mpi remnants	Georgi Gerganov

2024-05-29	sync : ggml	Georgi Gerganov

2024-05-29	ggml : restore ggml_rope_xpos_inplace (ggml/0)	Georgi Gerganov
	ggml-ci
2024-05-29	Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and ↵	Akarshan Biswas
	Linux distro (#7605)
2024-05-29	ggml : fix typo in ggml.c (#7603)	zhouwg

2024-05-29	[SYCL] Align GEMM dispatch (#7566)	Meng, Hengyu
	* align GEMM dispatch
2024-05-28	Tokenizer WPM fixes (#7500)	jaime-m-p
	* Update random test: add_bos_token. * Update random test: add WPM models for testing. * Build vocab.special_tokens_cache using vocab token types. * Fix and improve WPM preprocessing. - Fix unicode edge case combinations. - Split by whitspace in the same pass. * Discard all tokens when no matching found.
2024-05-28	sycl : fix assert (#7563)	Georgi Gerganov

2024-05-28	llama : support small Granite models (#7481)	Giuseppe Scrivano
	* Add optional MLP bias for Granite models Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to properly support Granite. * llama: honor add_space_prefix from the model configuration propagate the add_space_prefix configuration from the HF model configuration to the gguf file and honor it with the gpt2 tokenizer. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * llama: add support for small granite models it works only for the small models 3b and 8b. The convert-hf-to-gguf.py script uses the vocabulary size of the granite models to detect granite and set the correct configuration. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Steffen Roecker <sroecker@redhat.com>
2024-05-28	vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552)	k.h.lai

2024-05-28	rpc : resource management rework (#7562)	Radoslav Gerganov
	* rpc : resource management rework * address review comments
2024-05-28	Add support for DeepseekV2ForCausalLM (#7519)	fairydreaming
	* common : increase max number of experts to 160 * common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture * common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier * convert-hf : add model conversion support for DeepseekV2ForCausalLM * llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models * llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor) * llama : add inference support for LLM_ARCH_DEEPSEEK2 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-28	tests : fix test-tokenizer-0.sh	Georgi Gerganov

2024-05-28	llama : handle unknown utf8 bytes (#7588)	Georgi Gerganov

2024-05-28	github: add refactor to issue template (#7561)	Brian
	* github: add refactor issue template [no ci] * Update 07-refactor.yml
2024-05-28	[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)	Neo Zhang
	* fix mul_mat_id to match the change of api * rm comment * rm unused or duplicated code, rename as review comment
2024-05-28	ggml : generalize GGML_OP_CONCAT (#7563)	Georgi Gerganov
	* ggml : generalize GGML_OP_CONCAT (WIP) ggml-ci * tests : add dim != 2 tests * metal : generalize concat kernel * tests : naming * cuda : generalize concat kernel ggml-ci * sycl : add warning and assert * ggml : fix op params handling * metal : bugfix kernel ggml-ci * ggml : reimplement CPU and Metal * cuda : add asserts ggml-ci * ggml : fix ptrs ggml-ci
2024-05-28	server: do not remove whitespace at the start of a completion chunk (#7524)	mgroeber9110

2024-05-28	Markdownish code block fix (#7571)	Nathan Epstein
	* markdownish codeblock fix * updating regexes
2024-05-28	llava : update clip.h (#7580)	Ikko Eltociear Ashimine
	overriden -> overridden
2024-05-28	update HIP_UMA #7399 (#7414)	Djip007
	* update HIP_UMA #7399 add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable. - get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103) * simplify code, more consistent style --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-05-28	adding in x64 targets to cmake presets (#7574)	kunnis

2024-05-27	make: add --device-debug to NVCC debug flags (#7542)	Johannes Gäßler

2024-05-27	Allow multiple copy function pointers for CUDA graph kernel param updates ↵	agray3
	(#7565) CUDA graphs require parameter updates to kernels associated with GGML_OP_CPY nodes. Previously the implementation only checked for a single CUDA kernel in such nodes, but this caused a bug in cases where 2 such kernels exist. This fixes the issue by using a vector to allow multiple function pointers to be stored and checked against. Fixes #7942
2024-05-27	Fix q_xxs using mul_mat_q (#7459)	AidanBeltonS