examples: cache hf model when --model not provided (#7353)

* examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided
author: Amir <amir_zia@outlook.com> 2024-05-21 17:13:12 +0300
committer: GitHub <noreply@github.com> 2024-05-21 17:13:12 +0300
commit: 11474e756de3f56b760986e73086d40e787e52f8 (patch)
tree: ffb1c5369b3e7e8f128a114c7a7f1b5899376ac9 /examples
parent: d8ee90222791afff2ab666ded4cb6195fd94cced (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/examples/main/README.md b/examples/main/README.md
index 97e2ae4c..ee930f4e 100644
--- a/examples/main/README.md
+++ b/examples/main/README.md
@@ -325,3 +325,5 @@ These options provide extra functionality and customization when running the LLa
 -   `-ts SPLIT, --tensor-split SPLIT`: When using multiple GPUs this option controls how large tensors should be split across all GPUs. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each GPU should get in order. For example, "3,2" will assign 60% of the data to GPU 0 and 40% to GPU 1. By default the data is split in proportion to VRAM but this may not be optimal for performance.
 -   `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains.
 -   `--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.
+
+-   `-hfr URL --hf-repo URL`: The url to the Hugging Face model repository. Used in conjunction with `--hf-file` or `-hff`. The model is downloaded and stored in the file provided by `-m` or `--model`. If `-m` is not provided, the model is auto-stored in the path specified by the `LLAMA_CACHE` environment variable  or in an OS-specific local cache.
author	Amir <amir_zia@outlook.com>	2024-05-21 17:13:12 +0300
committer	GitHub <noreply@github.com>	2024-05-21 17:13:12 +0300
commit	11474e756de3f56b760986e73086d40e787e52f8 (patch)
tree	ffb1c5369b3e7e8f128a114c7a7f1b5899376ac9 /examples
parent	d8ee90222791afff2ab666ded4cb6195fd94cced (diff)