ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	snadampal <87143774+snadampal@users.noreply.github.com>	2024-02-11 07:22:33 -0600
committer	GitHub <noreply@github.com>	2024-02-11 15:22:33 +0200
commit	a07d0fee1f05c5c1dc49948ae1a3293db017275f (patch)
tree	06614ff1364269493e4853333ced56802abd7284 /common/sampling.cpp
parent	e4640d8fdf56f14a6db3d092bcd3d2d315cb5d04 (diff)

ggml : add mmla kernels for quantized GEMM (#4966)

* ggml: aarch64: implement smmla kernel for q8_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q8_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_1_q8_1 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_1_q8_1 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: update unit tests for the new vec_dot interface * llama.cpp: add MATMUL_INT8 capability to system_info

Diffstat (limited to 'common/sampling.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: