From f6793491b5af6da75edad34d6f503ef86d31b09f Mon Sep 17 00:00:00 2001
From: "Nam D. Tran" <42194884+namtranase@users.noreply.github.com>
Date: Wed, 27 Dec 2023 22:39:45 +0700
Subject: llama : add AWQ for llama, llama2, mpt, and mistral models (#4593)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* update: awq support llama-7b model

* update: change order

* update: benchmark results for llama2-7b

* update: mistral 7b v1 benchmark

* update: support 4 models

* fix: Readme

* update: ready for PR

* update: readme

* fix: readme

* update: change order import

* black

* format code

* update: work for bot mpt and awqmpt

* update: readme

* Rename to llm_build_ffn_mpt_awq

* Formatted other files

* Fixed params count

* fix: remove code

* update: more detail for mpt

* fix: readme

* fix: readme

* update: change folder architecture

* fix: common.cpp

* fix: readme

* fix: remove ggml_repeat

* update: cicd

* update: cicd

* uppdate: remove use_awq arg

* update: readme

* llama : adapt plamo to new ffn

ggml-ci

---------

Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io>
Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---
 gguf-py/gguf/tensor_mapping.py | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'gguf-py/gguf/tensor_mapping.py')

diff --git a/gguf-py/gguf/tensor_mapping.py b/gguf-py/gguf/tensor_mapping.py
index 446c6b68..0b8f7041 100644
--- a/gguf-py/gguf/tensor_mapping.py
+++ b/gguf-py/gguf/tensor_mapping.py
@@ -188,6 +188,11 @@ class TensorNameMap:
             "model.layers.{bid}.block_sparse_moe.experts.{xid}.w3", # mixtral
         ),
 
+        # AWQ-activation gate
+        MODEL_TENSOR.FFN_ACT: (
+            "transformer.blocks.{bid}.ffn.act",  # mpt
+        ),
+
         # Feed-forward gate
         MODEL_TENSOR.FFN_GATE: (
             "model.layers.{bid}.mlp.gate_proj",           # llama-hf refact
-- 
cgit v1.2.3