diff options
Diffstat (limited to 'examples')
-rw-r--r-- | examples/llava/MobileVLM-README.md | 2 | ||||
-rw-r--r-- | examples/llava/README.md | 2 | ||||
-rw-r--r-- | examples/main/README.md | 2 | ||||
-rw-r--r-- | examples/perplexity/README.md | 31 | ||||
-rw-r--r-- | examples/quantize/README.md | 22 |
5 files changed, 29 insertions, 30 deletions
diff --git a/examples/llava/MobileVLM-README.md b/examples/llava/MobileVLM-README.md index 96b04852..413e433d 100644 --- a/examples/llava/MobileVLM-README.md +++ b/examples/llava/MobileVLM-README.md @@ -22,7 +22,7 @@ After building, run: `./llava-cli` to see the usage. For example: ## Model conversion -- Clone `mobileVLM-1.7B` and `clip-vit-large-patch14-336` locally: +1. Clone `mobileVLM-1.7B` and `clip-vit-large-patch14-336` locally: ```sh git clone https://huggingface.co/mtgv/MobileVLM-1.7B diff --git a/examples/llava/README.md b/examples/llava/README.md index 67cb0f22..d4810d42 100644 --- a/examples/llava/README.md +++ b/examples/llava/README.md @@ -24,7 +24,7 @@ After building, run: `./llava-cli` to see the usage. For example: ## LLaVA 1.5 -- Clone a LLaVA and a CLIP model ([available options](https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md)). For example: +1. Clone a LLaVA and a CLIP model ([available options](https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md)). For example: ```sh git clone https://huggingface.co/liuhaotian/llava-v1.5-7b diff --git a/examples/main/README.md b/examples/main/README.md index bb696b56..10a589ce 100644 --- a/examples/main/README.md +++ b/examples/main/README.md @@ -310,7 +310,7 @@ These options help improve the performance and memory usage of the LLaMA models. ### Quantization -For information about 4-bit quantization, which can significantly improve performance and reduce memory usage, please refer to llama.cpp's primary [README](../../README.md#prepare-data--run). +For information about 4-bit quantization, which can significantly improve performance and reduce memory usage, please refer to llama.cpp's primary [README](../../README.md#prepare-and-quantize). ## Additional Options diff --git a/examples/perplexity/README.md b/examples/perplexity/README.md index 50e1af01..1a8c0dd6 100644 --- a/examples/perplexity/README.md +++ b/examples/perplexity/README.md @@ -3,19 +3,18 @@ TODO ## Llama 2 70B Scorechart -Quantization | Model size (GiB) | Perplexity | Delta to fp16 --- | -- | -- | -- -Q4_0 | 36.20 | 3.5550 | 3.61% -Q4_1 | 40.20 | 3.5125 | 2.37% -Q5_0 | 44.20 | 3.4744 | 1.26% -Q2_K | 27.27 | 3.7339 | 8.82% -Q3_K_S | 27.86 | 3.7019 | 7.89% -Q3_K_M | 30.83 | 3.5932 | 4.72% -Q3_K_L | 33.67 | 3.5617 | 3.80% -Q4_K_S | 36.39 | 3.4852 | 1.57% -Q4_K_M | 38.54 | 3.4725 | 1.20% -Q5_K_S | 44.20 | 3.4483 | 0.50% -Q5_K_M | 45.41 | 3.4451 | 0.40% -Q6_K | 52.70 | 3.4367 | 0.16% -fp16 | 128.5 | 3.4313 | - - +| Quantization | Model size (GiB) | Perplexity | Delta to fp16 | +|--------------|------------------|------------|---------------| +| Q4_0 | 36.20 | 3.5550 | 3.61% | +| Q4_1 | 40.20 | 3.5125 | 2.37% | +| Q5_0 | 44.20 | 3.4744 | 1.26% | +| Q2_K | 27.27 | 3.7339 | 8.82% | +| Q3_K_S | 27.86 | 3.7019 | 7.89% | +| Q3_K_M | 30.83 | 3.5932 | 4.72% | +| Q3_K_L | 33.67 | 3.5617 | 3.80% | +| Q4_K_S | 36.39 | 3.4852 | 1.57% | +| Q4_K_M | 38.54 | 3.4725 | 1.20% | +| Q5_K_S | 44.20 | 3.4483 | 0.50% | +| Q5_K_M | 45.41 | 3.4451 | 0.40% | +| Q6_K | 52.70 | 3.4367 | 0.16% | +| fp16 | 128.5 | 3.4313 | - | diff --git a/examples/quantize/README.md b/examples/quantize/README.md index c8b9a27a..8a10365c 100644 --- a/examples/quantize/README.md +++ b/examples/quantize/README.md @@ -4,17 +4,17 @@ TODO ## Llama 2 7B -Quantization | Bits per Weight (BPW) --- | -- -Q2_K | 3.35 -Q3_K_S | 3.50 -Q3_K_M | 3.91 -Q3_K_L | 4.27 -Q4_K_S | 4.58 -Q4_K_M | 4.84 -Q5_K_S | 5.52 -Q5_K_M | 5.68 -Q6_K | 6.56 +| Quantization | Bits per Weight (BPW) | +|--------------|-----------------------| +| Q2_K | 3.35 | +| Q3_K_S | 3.50 | +| Q3_K_M | 3.91 | +| Q3_K_L | 4.27 | +| Q4_K_S | 4.58 | +| Q4_K_M | 4.84 | +| Q5_K_S | 5.52 | +| Q5_K_M | 5.68 | +| Q6_K | 6.56 | ## Llama 2 13B Quantization | Bits per Weight (BPW) |