ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-02-21	llama : add `gemma` model (#5631)	postmasters
	There are couple things in this architecture: 1. Shared input and output embedding parameters. 2. Key length and value length are not derived from `n_embd`. More information about the models can be found at https://ai.google.dev/gemma. GGUFs can be downloaded from https://huggingface.co/google.
2024-02-21	[SYCL] conext add name (#5624)	Meng, Hengyu
	* [SYCL] conext add name * name should start with SYCL*
2024-02-21	IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)	Kawrakow
	* iq4_nl: squash commits for easier rebase * Basics (quantize, dequantize) * CUDA dequantize and dot product * Slightly faster CUDA dot product (120 t/s) * Switch to 6-bit scales * Scalar dot product * AVX2 dot product * ARM_NEON dot product * Works on metal, but still slow * Slightly better Metal dot product * Another small Metal improvement * Metal dot product is getting there * Faster CUDA dot product * Add 1/8 ffn_down layers as Q5_K when no imatrix has been provided * Report the actual bpw * Add _xs mix that is 4.05 bpw for non-MoE models * Remove IQ4_XS for now, slightly adjust kvalues_iq4nl * AVX2 dot product uses Q8_0 instead of Q8_K * Add to test-backend-ops * Minor fix * Also use use Q5_K for attn_output in MoE models * Fixes after merging latest master * Switching to blocks of 32 * AVX2 for blocks of 32 * Scaler dot product for blocks of 32 * ARM_NEON dot product for blocks of 32 * Metal kernels for blocks of 32 * Slightly faster Metal kernels * iq4_nl: Fix after merging with master * iq4_nl: another fix after merging with master * Use IQ4_NL instead of Q4_K when using k-quants is not possible * Fix typo that makes several tests fail * It was the ggml_vdotq thing missed inside the brackets --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-20	server : support llava 1.6 (#5553)	CJ Pais
	* server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build
2024-02-20	make : fix debug build with CUDA (#5616)	slaren

2024-02-20	llava : add explicit instructions for llava-1.6 (#5611)	Daniel Bevenius
	This commit contains a suggestion for the README.md in the llava example. The suggestion adds explicit instructions for how to convert a llava-1.6 model and run it using llava-cli. The motivation for this is that having explicit instructions similar to the 1.5 instructions will make it easier for users to try this out. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-20	Server: use llama_chat_apply_template (#5593)	Xuan Son Nguyen
	* server: use llama_chat_apply_template * server: remove trailing space * server: fix format_chat * server: fix help message Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: fix formatted_chat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-20	readme : update UI list (#5605)	Dane Madsen
	* Add maid to ui list * Specify licence
2024-02-20	metal : add build system support for embedded metal library (#5604)	Haoxiang Fei
	* add build support for embedded metal library * Update Makefile --------- Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-20	server : health endpoint configurable failure on no slot (#5594)	Pierrick Hymbert

2024-02-20	Update ggml_sycl_op_mul_mat_vec_q (#5502)	AidanBeltonS
	* Update ggml_sycl_op_mul_mat_vec_q * Apply suggestions from code review Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * revert suggestion on macro * fix bug * Add quant type GGML_TYPE_IQ1_S to unsupported * fix format --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-02-19	nix: now that we can do so, allow MacOS to build Vulkan binaries	Mathijs de Bruin
	Author: Philip Taron <philip.taron@gmail.com> Date: Tue Feb 13 20:28:02 2024 +0000
2024-02-19	Enable Vulkan MacOS CI	0cc4m

2024-02-19	Refactor validation and enumeration platform checks into functions to clean ↵	0cc4m
	up ggml_vk_instance_init()
2024-02-19	Add check for VK_KHR_portability_enumeration for MoltenVK support	0cc4m

2024-02-19	Add preprocessor checks for Apple devices.	Mathijs de Bruin
	Based on work by @rbourgeat in https://github.com/ggerganov/llama.cpp/pull/5322/files
2024-02-19	Resolve ErrorIncompatibleDriver with Vulkan on MacOS.	Mathijs de Bruin
	Refs: - https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f - https://github.com/SaschaWillems/Vulkan/issues/954 - https://github.com/haasn/libplacebo/issues/128 - https://github.com/KhronosGroup/Vulkan-Samples/issues/476
2024-02-19	Allow for Vulkan build with Accelerate.	Mathijs de Bruin
	Closes #5304
2024-02-19	cuda : ignore peer access already enabled errors (#5597)	slaren
	* cuda : ignore peer access already enabled errors * fix hip
2024-02-19	make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598)	Jared Van Bortel

2024-02-19	examples : support minItems/maxItems in JSON grammar converter (#5039)	nopperl
	* support minLength and maxLength in JSON schema grammar converter * Update examples/json-schema-to-grammar.py --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19	llava : remove extra cont (#5587)	Georgi Gerganov

2024-02-19	llava : replace ggml_cpy with ggml_cont	slaren

2024-02-19	sync : ggml	Georgi Gerganov
	ggml-ci
2024-02-19	ggml-alloc : apply ggml/731	Georgi Gerganov

2024-02-19	metal : option to embed MSL source into compiled binary (whisper/1842)	Didzis Gosko
	* ggml : embed Metal library source (ggml-metal.metal) into binary enable by setting WHISPER_EMBED_METAL_LIBRARY * rename the build option * rename the preprocessor directive * generate Metal library embedding assembly on-fly during build process
2024-02-19	ci : enable -Werror for CUDA builds (#5579)	Georgi Gerganov
	* cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci
2024-02-19	make : fix CUDA build (#5580)	Georgi Gerganov

2024-02-19	readme : fix typo in README-sycl.md (#5353)	valiray

2024-02-19	cmake : remove obsolete sycl compile flags (#5581)	Abhilash Majumder
	* rm unwanted sycl compile options * fix bug * fix bug * format fix
2024-02-19	minor : fix trailing whitespace (#5538)	Georgi Gerganov

2024-02-19	llava : avoid changing the original BakLLaVA model (#5577)	Daniel Bevenius
	This is a follup of Commit fc0c8d286a533363a9a663510b62af85ffad58b3 ("llava : update surgery script to not remove tensors") but this time the change is to the BakLLaVA specific part of the surgery script. I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works as expected using the instructions in README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-19	baby-llama : allocate graphs in ggml_context (#5573)	NawafAlansari
	* Fixed the baby-llama issue (see issue #4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19	llama : add llama_chat_apply_template() (#5538)	Xuan Son Nguyen
	* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
2024-02-19	cuda, metal : fix nans in soft_max (#5574)	slaren
	* cuda : fix nans in soft_max * metal : fix nans in soft_max --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19	readme : update (#5572)	Mirko185
	Added 1.5-bit on README.md
2024-02-19	ggml : android and old glibc NUMA incompatibility bugfixes (#5557)	bmwl
	* #ifdef out some code NUMA blocks for Android due to lack of support * added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper * Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc * harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways --------- Co-authored-by: root <root@nenya.lothlorien.ca>
2024-02-18	build : pass all warning flags to nvcc via -Xcompiler (#5570)	Jared Van Bortel
	* build : pass all warning flags to nvcc via -Xcompiler * make : fix apparent mis-merge from #3952 * make : fix incorrect GF_CC_VER for CUDA host compiler
2024-02-18	ggml : restore vec dot stride arg names (#5453)	Georgi Gerganov

2024-02-18	ci : fix wikitext url + compile warnings (#5569)	Georgi Gerganov
	ggml-ci
2024-02-18	metal : fix unused warnings (#0)	Georgi Gerganov

2024-02-18	common, server : surface min_keep as its own parameter (#5567)	Robey Holderith
	* Feature - surface min_keep as its own parameter * Updated README with min_keep param
2024-02-18	server : slots monitoring endpoint (#5550)	Pierrick Hymbert

2024-02-18	sampling : do not set min_keep to n_probs (#5564)	Georgi Gerganov

2024-02-18	cmake : fix GGML_USE_SYCL typo (#5555)	Georgi Gerganov

2024-02-18	server : enhanced health endpoint (#5548)	Pierrick Hymbert
	* server: enrich health endpoint with available slots, return 503 if not slots are available * server: document new status no slot available in the README.md
2024-02-18	server : --n-predict option document and cap to max value (#5549)	Pierrick Hymbert
	* server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option
2024-02-18	server : graceful server shutdown (#5244)	Daniel Hiltgen
	This updates the server queue to support graceful shutdown of the server on signals.
2024-02-18	common : fix ub (#5530)	Georgi Gerganov

2024-02-18	ggml, common, examples, tests : fixed type arguments in printf (#5528)	Herman Semenov