summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2024-02-06 19:00:16 +0200
committerGitHub <noreply@github.com>2024-02-06 19:00:16 +0200
commitb08f22c882a1443e6b97081f3ce718a4d1a741f8 (patch)
treeddfe289fed86e1d59a21ea2d6f625ff44620eec5
parentf57fadc009cbff741a1961cb7896c47d73978d2c (diff)
Update README.md (#5366)
Add some links to quantization related PRs
-rw-r--r--README.md14
1 files changed, 13 insertions, 1 deletions
diff --git a/README.md b/README.md
index cc87ac79..34f2021f 100644
--- a/README.md
+++ b/README.md
@@ -736,9 +736,21 @@ Several quantization methods are supported. They differ in the resulting model d
| 13B | bits/weight | 16.0 | 4.5 | 5.0 | 5.5 | 6.0 | 8.5 |
- [k-quants](https://github.com/ggerganov/llama.cpp/pull/1684)
-- recent k-quants improvements
+- recent k-quants improvements and new i-quants
- [#2707](https://github.com/ggerganov/llama.cpp/pull/2707)
- [#2807](https://github.com/ggerganov/llama.cpp/pull/2807)
+ - [#4773 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4773)
+ - [#4856 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4856)
+ - [#4861 - importance matrix](https://github.com/ggerganov/llama.cpp/pull/4861)
+ - [#4872 - MoE models](https://github.com/ggerganov/llama.cpp/pull/4872)
+ - [#4897 - 2-bit quantization](https://github.com/ggerganov/llama.cpp/pull/4897)
+ - [#4930 - imatrix for all k-quants](https://github.com/ggerganov/llama.cpp/pull/4930)
+ - [#4951 - imatrix on the GPU](https://github.com/ggerganov/llama.cpp/pull/4957)
+ - [#4969 - imatrix for legacy quants](https://github.com/ggerganov/llama.cpp/pull/4969)
+ - [#4996 - k-qunats tuning](https://github.com/ggerganov/llama.cpp/pull/4996)
+ - [#5060 - Q3_K_XS](https://github.com/ggerganov/llama.cpp/pull/5060)
+ - [#5196 - 3-bit i-quants](https://github.com/ggerganov/llama.cpp/pull/5196)
+ - [quantization tuning](https://github.com/ggerganov/llama.cpp/pull/5320), [another one](https://github.com/ggerganov/llama.cpp/pull/5334), and [another one](https://github.com/ggerganov/llama.cpp/pull/5361)
### Perplexity (measuring model quality)