diff options
author | Georgi Gerganov <ggerganov@gmail.com> | 2024-02-21 15:39:54 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-02-21 15:39:54 +0200 |
commit | c14f72db9c62d71d35eb1c141745c0bd0cb27b49 (patch) | |
tree | ae5bec4ccc0cf75ed769faf9a6a0691a83ae99aa | |
parent | cc6cac08e38e32bf40bbe07e9e8f8f0130b5fd94 (diff) |
readme : update hot topics
-rw-r--r-- | README.md | 9 |
1 files changed, 2 insertions, 7 deletions
@@ -10,13 +10,8 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) ### Hot topics -- Remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD: https://github.com/ggerganov/llama.cpp/pull/5240 -- Incoming backends: https://github.com/ggerganov/llama.cpp/discussions/5138 - - [SYCL backend](README-sycl.md) is ready (1/28/2024), support Linux/Windows in Intel GPUs (iGPU, Arc/Flex/Max series) -- New SOTA quantized models, including pure 2-bits: https://huggingface.co/ikawrakow -- Collecting Apple Silicon performance stats: - - M-series: https://github.com/ggerganov/llama.cpp/discussions/4167 - - A-series: https://github.com/ggerganov/llama.cpp/discussions/4508 +- Support for Gemma models: https://github.com/ggerganov/llama.cpp/pull/5631 +- Non-linear quantization IQ4_NL: https://github.com/ggerganov/llama.cpp/pull/5590 - Looking for contributions to improve and maintain the `server` example: https://github.com/ggerganov/llama.cpp/issues/4216 ---- |