diff options
author | Meng Zhang <meng@tabbyml.com> | 2023-09-16 03:02:13 +0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-09-15 22:02:13 +0300 |
commit | 4fe09dfe665c58a753dc9eb638dd4dca1cd35488 (patch) | |
tree | 8bde812820738105894d6c179c3b3615b5c06481 /examples/simple/simple.cpp | |
parent | 80291a1d02a07f7f66666fb576c5b1e75aa48b46 (diff) |
llama : add support for StarCoder model architectures (#3187)
* add placeholder of starcoder in gguf / llama.cpp
* support convert starcoder weights to gguf
* convert MQA to MHA
* fix ffn_down name
* add LLM_ARCH_STARCODER to llama.cpp
* set head_count_kv = 1
* load starcoder weight
* add max_position_embeddings
* set n_positions to max_positioin_embeddings
* properly load all starcoder params
* fix head count kv
* fix comments
* fix vram calculation for starcoder
* store mqa directly
* add input embeddings handling
* add TBD
* working in cpu, metal buggy
* cleanup useless code
* metal : fix out-of-bounds access in soft_max kernels
* llama : make starcoder graph build more consistent with others
* refactor: cleanup comments a bit
* add other starcoder models: 3B, 7B, 15B
* support-mqa-directly
* fix: remove max_position_embeddings, use n_train_ctx
* Update llama.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update llama.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix: switch to space from tab
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/simple/simple.cpp')
0 files changed, 0 insertions, 0 deletions