summaryrefslogtreecommitdiff
path: root/gguf-py/gguf/quants.py
diff options
context:
space:
mode:
authorfairydreaming <166155368+fairydreaming@users.noreply.github.com>2024-05-24 14:31:13 +0200
committerGitHub <noreply@github.com>2024-05-24 14:31:13 +0200
commitfbca2f27fc7fa9aa4a8ad0357478fdb908472908 (patch)
tree9226fa114f6e0f6578c6946f5a23c7ab76ef0854 /gguf-py/gguf/quants.py
parent0df0aa8e43c3378975269a51f9b876c8692e70da (diff)
Add support for ArcticForCausalLM (#7020)
* common : increase max number of experts to 128 * common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn * gguf-py : add architecture-specific block mappings that override selected general block mappings * convert-hf : add model conversion support for ArcticForCausalLM * convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM) * llama : add inference support for LLM_ARCH_ARCTIC --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Diffstat (limited to 'gguf-py/gguf/quants.py')
0 files changed, 0 insertions, 0 deletions