diff options
author | M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> | 2023-10-12 18:23:18 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-12 18:23:18 +0300 |
commit | 370359e5baf619f3a8d461023143d1494b1e8fde (patch) | |
tree | acfd94911cdb83780f7afc3a703b8abb31aa00e2 /common/common.h | |
parent | 9e24cc6e2e589d405bd1720c400f5b0b9d0ca3ee (diff) |
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA
* rm scratch buf for now, will revert after cleanup
* LLaVA image encoder is working. will combine with llama
* Add llava inference code, but it's buggy. debugging
* LLaVA is working e2e, needs to optimize memory allocation + cleanup
* Use ggml_allocr + rm unnecessary code
* fix: crlf -> lf
* fix: new line at EoF
* fix: trailing whitespace
* Add readme
* Update readme
* Some cleanup
* Are you happy editorconfig?
* rm unused batch image preprocessing
* rm unused import
* fix: rm designated initializers
* introduce pad-to-square mode for non-square images
* are you happy editorconfig?
* gitignore /llava
* Handle cases where image file does not exist
* add llava target to Makefile
* add support for 13b model variant
* Maybe seed is unlucky?
* Check if apples are compared to apples
* are you happy editorconfig?
* Use temperature = 0.1 by default
* command line: use gpt_params_parse()
* minor
* handle default n_predict
* fix typo
* llava : code formatting, rename files, fix compile warnings
* do not use Wno-cast-qual for MSVC
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'common/common.h')
-rw-r--r-- | common/common.h | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/common/common.h b/common/common.h index fa115536..36fd4416 100644 --- a/common/common.h +++ b/common/common.h @@ -104,6 +104,10 @@ struct gpt_params { bool numa = false; // attempt optimizations that help on some NUMA systems bool verbose_prompt = false; // print prompt tokens before generation bool infill = false; // use infill mode + + // multimodal models (see examples/llava) + std::string mmproj = ""; // path to multimodal projector + std::string image = ""; // path to an image file }; bool gpt_params_parse(int argc, char ** argv, gpt_params & params); |