diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2024-08-12 15:16:00 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-08-12 15:16:00 +0200 |
commit | bb5ff6fadec40c2e3aa3033dc68bec9367a0c9cc (patch) | |
tree | fae18a44f38876c88975554114af742496bc9498 | |
parent | 8f43e551038af2547b5c01d0e9edd641c0e4bd29 (diff) |
Update README.md
-rw-r--r-- | README.md | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -9,7 +9,7 @@ This repository is a clone of [llama.cpp](https://github.com/ggerganov/llama.cpp * Faster CPU inference for MoE models with similar performance gains * Implementation of the [Bitnet b1.58](https://huggingface.co/1bitLLM/bitnet_b1_58-3B) model for the CPU (`AVX2` and `ARM_NEON`) and GPU (`CUDA` and `Metal`). This implementation is much faster than the unmerged `llama.cpp` [PR-8151](https://github.com/ggerganov/llama.cpp/pull/8151) -If you are not already familiar with [llama.cpp](https://github.com/ggerganov/llama.cpp), it is better to start there. For those familiar with `llama.cpp`, everything here works the same as in `llama.cpp` (or at least the way `llama.cpp` worked when I last synced on June 21). +If you are not already familiar with [llama.cpp](https://github.com/ggerganov/llama.cpp), it is better to start there. For those familiar with `llama.cpp`, everything here works the same as in `llama.cpp` (or at least the way `llama.cpp` worked when I last synced on Aug 12 2024). Note that I have published some, but not all, of the code in this repository in a series of [llamafile](https://github.com/Mozilla-Ocho/llamafile) PRs ([394](https://github.com/Mozilla-Ocho/llamafile/pull/394), [405](https://github.com/Mozilla-Ocho/llamafile/pull/405), [428](https://github.com/Mozilla-Ocho/llamafile/pull/428), [435](https://github.com/Mozilla-Ocho/llamafile/pull/435), [453](https://github.com/Mozilla-Ocho/llamafile/pull/453), and [464](https://github.com/Mozilla-Ocho/llamafile/pull/464)) |