llama : add AWQ for llama, llama2, mpt, and mistral models (#4593)

* update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci --------- Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io> Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Nam D. Tran <42194884+namtranase@users.noreply.github.com> 2023-12-27 22:39:45 +0700
committer: GitHub <noreply@github.com> 2023-12-27 17:39:45 +0200
commit: f6793491b5af6da75edad34d6f503ef86d31b09f (patch)
tree: ba50b7ae1aba91cb465a06970a11137baab7afcf /convert.py
parent: 879b690a9e1eb1ab0a29b58236fc76978fb4d902 (diff)
1 files changed, 14 insertions, 0 deletions
diff --git a/convert.py b/convert.py
index 1f0c4f2f..c3f3fc0a 100755
--- a/convert.py
+++ b/convert.py
@@ -1187,6 +1187,7 @@ def main(args_in: list[str] | None = None) -> None:
         # We currently only support Q8_0 output on little endian systems.
         output_choices.append("q8_0")
     parser = argparse.ArgumentParser(description="Convert a LLaMa model to a GGML compatible file")
+    parser.add_argument("--awq-path",    type=Path,              help="Path to scale awq cache file", default=None)
     parser.add_argument("--dump",        action="store_true",    help="don't convert, just show what's in the model")
     parser.add_argument("--dump-single", action="store_true",    help="don't convert, just show what's in a single model file")
     parser.add_argument("--vocab-only",  action="store_true",    help="extract only the vocab")
@@ -1200,6 +1201,19 @@ def main(args_in: list[str] | None = None) -> None:
     parser.add_argument("--padvocab", action="store_true", help="add pad tokens when model vocab expects more than tokenizer metadata provides")
 
     args = parser.parse_args(args_in)
+    if args.awq_path:
+        sys.path.insert(1, str(Path(__file__).parent / 'awq-py'))
+        from awq.apply_awq import add_scale_weights
+        tmp_model_path = args.model / "weighted_model"
+        if tmp_model_path.is_dir():
+            print(f"{tmp_model_path} exists as a weighted model.")
+        else:
+            tmp_model_path.mkdir(parents=True, exist_ok=True)
+            print("Saving new weighted model ...")
+            add_scale_weights(str(args.model), str(args.awq_path), str(tmp_model_path))
+            print(f"Saved weighted model at {tmp_model_path}.")
+        args.model = tmp_model_path
+
     if args.dump_single:
         model_plus = lazy_load_file(args.model)
         do_dump_model(model_plus)
author	Nam D. Tran <42194884+namtranase@users.noreply.github.com>	2023-12-27 22:39:45 +0700
committer	GitHub <noreply@github.com>	2023-12-27 17:39:45 +0200
commit	f6793491b5af6da75edad34d6f503ef86d31b09f (patch)
tree	ba50b7ae1aba91cb465a06970a11137baab7afcf /convert.py
parent	879b690a9e1eb1ab0a29b58236fc76978fb4d902 (diff)