diff options
author | M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> | 2023-10-12 18:23:18 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-12 18:23:18 +0300 |
commit | 370359e5baf619f3a8d461023143d1494b1e8fde (patch) | |
tree | acfd94911cdb83780f7afc3a703b8abb31aa00e2 /examples/llava/llava-surgery.py | |
parent | 9e24cc6e2e589d405bd1720c400f5b0b9d0ca3ee (diff) |
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA
* rm scratch buf for now, will revert after cleanup
* LLaVA image encoder is working. will combine with llama
* Add llava inference code, but it's buggy. debugging
* LLaVA is working e2e, needs to optimize memory allocation + cleanup
* Use ggml_allocr + rm unnecessary code
* fix: crlf -> lf
* fix: new line at EoF
* fix: trailing whitespace
* Add readme
* Update readme
* Some cleanup
* Are you happy editorconfig?
* rm unused batch image preprocessing
* rm unused import
* fix: rm designated initializers
* introduce pad-to-square mode for non-square images
* are you happy editorconfig?
* gitignore /llava
* Handle cases where image file does not exist
* add llava target to Makefile
* add support for 13b model variant
* Maybe seed is unlucky?
* Check if apples are compared to apples
* are you happy editorconfig?
* Use temperature = 0.1 by default
* command line: use gpt_params_parse()
* minor
* handle default n_predict
* fix typo
* llava : code formatting, rename files, fix compile warnings
* do not use Wno-cast-qual for MSVC
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/llava/llava-surgery.py')
-rw-r--r-- | examples/llava/llava-surgery.py | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/examples/llava/llava-surgery.py b/examples/llava/llava-surgery.py new file mode 100644 index 00000000..26294d9b --- /dev/null +++ b/examples/llava/llava-surgery.py @@ -0,0 +1,30 @@ +import argparse +import glob +import os +import torch + + +ap = argparse.ArgumentParser() +ap.add_argument("-m", "--model", help="Path to LLaVA v1.5 model") +args = ap.parse_args() + +# find the model part that includes the the multimodal projector weights +path = sorted(glob.glob(f"{args.model}/pytorch_model*.bin"))[-1] +checkpoint = torch.load(path) + +# get a list of mm tensor names +mm_tensors = [k for k, v in checkpoint.items() if k.startswith("model.mm_projector")] + +# store these tensors in a new dictionary and torch.save them +projector = {name: checkpoint[name] for name in mm_tensors} +torch.save(projector, f"{args.model}/llava.projector") + +# remove these tensors from the checkpoint and save it again +for name in mm_tensors: + del checkpoint[name] + +torch.save(checkpoint, path) + +print("Done!") +print(f"Now you can convert {args.model} to a a regular LLaMA GGUF file.") +print(f"Also, use {args.model}/llava.projector to prepare a llava-encoder.gguf file.") |