diff options
author | Daniel Bevenius <daniel.bevenius@gmail.com> | 2024-02-16 10:24:39 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-02-16 11:24:39 +0200 |
commit | 60ed04cf82dc91ade725dd7ad53f0ee81f76eccf (patch) | |
tree | 1701f0f8e59921c846561659f682ac78f19beb46 /examples/llava/README.md | |
parent | 594845aab1c6775877f6d9545a51dc0f8d0b3d77 (diff) |
llava : fix clip-model-is-vision flag in README.md (#5509)
* llava: fix clip-model-is-vision flag in README.md
This commit fixes the flag `--clip_model_is_vision` in README.md which
is does not match the actual flag:
```console
$ python convert-image-encoder-to-gguf.py --help
...
--clip-model-is-vision
The clip model is a pure vision model
(ShareGPT4V vision extract for example)
```
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* llava: update link to vit config in README.md
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Diffstat (limited to 'examples/llava/README.md')
-rw-r--r-- | examples/llava/README.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/examples/llava/README.md b/examples/llava/README.md index 1d5374f2..57eb4293 100644 --- a/examples/llava/README.md +++ b/examples/llava/README.md @@ -63,8 +63,8 @@ Now both the LLaMA part and the image encoder is in the `llava-v1.5-7b` director 1) Backup your pth/safetensor model files as llava-surgery modifies them 2) Use `python llava-surgery-v2.py -C -m /path/to/hf-model` which also supports llava-1.5 variants pytorch as well as safetensor models: - you will find a llava.projector and a llava.clip file in your model directory -3) Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory (https://huggingface.co/cmp-nct/llava-1.6-gguf/blob/main/config.json) -4) Create the visual gguf model: `python ./examples/llava/convert-image-encoder-to-gguf.py -m ../path/to/vit --llava-projector ../path/to/llava.projector --output-dir ../path/to/output --clip_model_is_vision` +3) Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory (https://huggingface.co/cmp-nct/llava-1.6-gguf/blob/main/config_vit.json) and rename it to config.json. +4) Create the visual gguf model: `python ./examples/llava/convert-image-encoder-to-gguf.py -m ../path/to/vit --llava-projector ../path/to/llava.projector --output-dir ../path/to/output --clip-model-is-vision` - This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP 5) Everything else as usual: convert.py the hf model, quantize as needed **note** llava-1.6 needs more context than llava-1.5, at least 3000 is needed (just run it at -c 4096) |