readme : update models, cuda + ppl instructions (#3510)

author: BarfingLemurs <128182951+BarfingLemurs@users.noreply.github.com> 2023-10-06 15:13:36 -0400
committer: GitHub <noreply@github.com> 2023-10-06 22:13:36 +0300
commit: 1faaae8c2bdc4a21302e367e0754c3fe74a8113e (patch)
tree: 95ff021842873eed42163d8f49ce13368dfdcf8e
parent: cb13d73a720c42d1958bff79b6869d77b26b8cea (diff)
1 files changed, 14 insertions, 13 deletions
diff --git a/README.md b/README.md
index e436818f..05627956 100644
--- a/README.md
+++ b/README.md
@@ -95,6 +95,7 @@ as the main playground for developing new features for the [ggml](https://github
 - [X] [Aquila-7B](https://huggingface.co/BAAI/Aquila-7B) / [AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
 - [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187)
 - [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
 
 **Bindings:**
 
@@ -377,7 +378,7 @@ Building the program with BLAS support may lead to some performance improvements
 
 - #### cuBLAS
 
-  This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
+  This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
   - Using `make`:
     ```bash
     make LLAMA_CUBLAS=1
@@ -613,6 +614,18 @@ For more information, see [https://huggingface.co/docs/transformers/perplexity](
 The perplexity measurements in table above are done against the `wikitext2` test dataset (https://paperswithcode.com/dataset/wikitext-2), with context length of 512.
 The time per token is measured on a MacBook M1 Pro 32GB RAM using 4 and 8 threads.
 
+#### How to run
+
+1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
+2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
+3. Output:
+```
+perplexity : calculating perplexity over 655 chunks
+24.43 seconds per pass - ETA 4.45 hours
+[1]4.5970,[2]5.1807,[3]6.0382,...
+```
+And after 4.45 hours, you will have the final perplexity.
+
 ### Interactive mode
 
 If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
@@ -775,18 +788,6 @@ If your issue is with model generation quality, then please at least scan the fo
     - [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
     - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
 
-#### How to run
-
-1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
-2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
-3. Output:
-```
-perplexity : calculating perplexity over 655 chunks
-24.43 seconds per pass - ETA 4.45 hours
-[1]4.5970,[2]5.1807,[3]6.0382,...
-```
-And after 4.45 hours, you will have the final perplexity.
-
 ### Android
 
 #### Building the Project using Android NDK
author	BarfingLemurs <128182951+BarfingLemurs@users.noreply.github.com>	2023-10-06 15:13:36 -0400
committer	GitHub <noreply@github.com>	2023-10-06 22:13:36 +0300
commit	1faaae8c2bdc4a21302e367e0754c3fe74a8113e (patch)
tree	95ff021842873eed42163d8f49ce13368dfdcf8e
parent	cb13d73a720c42d1958bff79b6869d77b26b8cea (diff)