diff options
author | M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> | 2023-10-12 18:23:18 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-12 18:23:18 +0300 |
commit | 370359e5baf619f3a8d461023143d1494b1e8fde (patch) | |
tree | acfd94911cdb83780f7afc3a703b8abb31aa00e2 /common/common.cpp | |
parent | 9e24cc6e2e589d405bd1720c400f5b0b9d0ca3ee (diff) |
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA
* rm scratch buf for now, will revert after cleanup
* LLaVA image encoder is working. will combine with llama
* Add llava inference code, but it's buggy. debugging
* LLaVA is working e2e, needs to optimize memory allocation + cleanup
* Use ggml_allocr + rm unnecessary code
* fix: crlf -> lf
* fix: new line at EoF
* fix: trailing whitespace
* Add readme
* Update readme
* Some cleanup
* Are you happy editorconfig?
* rm unused batch image preprocessing
* rm unused import
* fix: rm designated initializers
* introduce pad-to-square mode for non-square images
* are you happy editorconfig?
* gitignore /llava
* Handle cases where image file does not exist
* add llava target to Makefile
* add support for 13b model variant
* Maybe seed is unlucky?
* Check if apples are compared to apples
* are you happy editorconfig?
* Use temperature = 0.1 by default
* command line: use gpt_params_parse()
* minor
* handle default n_predict
* fix typo
* llava : code formatting, rename files, fix compile warnings
* do not use Wno-cast-qual for MSVC
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'common/common.cpp')
-rw-r--r-- | common/common.cpp | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/common/common.cpp b/common/common.cpp index 4214e63a..9c4f7df2 100644 --- a/common/common.cpp +++ b/common/common.cpp @@ -384,6 +384,18 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) { break; } params.lora_base = argv[i]; + } else if (arg == "--mmproj") { + if (++i >= argc) { + invalid_param = true; + break; + } + params.mmproj = argv[i]; + } else if (arg == "--image") { + if (++i >= argc) { + invalid_param = true; + break; + } + params.image = argv[i]; } else if (arg == "-i" || arg == "--interactive") { params.interactive = true; } else if (arg == "--embedding") { @@ -703,6 +715,8 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) { printf(" -np N, --parallel N number of parallel sequences to decode (default: %d)\n", params.n_parallel); printf(" -ns N, --sequences N number of sequences to decode (default: %d)\n", params.n_sequences); printf(" -cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)\n"); + printf(" --mmproj MMPROJ_FILE path to a multimodal projector file for LLaVA. see examples/llava/README.md\n"); + printf(" --image IMAGE_FILE path to an image file. use with multimodal models\n"); if (llama_mlock_supported()) { printf(" --mlock force system to keep model in RAM rather than swapping or compressing\n"); } |