summaryrefslogtreecommitdiff
path: root/examples/server
diff options
context:
space:
mode:
authorMeng Zhang <meng@tabbyml.com>2023-09-16 03:02:13 +0800
committerGitHub <noreply@github.com>2023-09-15 22:02:13 +0300
commit4fe09dfe665c58a753dc9eb638dd4dca1cd35488 (patch)
tree8bde812820738105894d6c179c3b3615b5c06481 /examples/server
parent80291a1d02a07f7f66666fb576c5b1e75aa48b46 (diff)
llama : add support for StarCoder model architectures (#3187)
* add placeholder of starcoder in gguf / llama.cpp * support convert starcoder weights to gguf * convert MQA to MHA * fix ffn_down name * add LLM_ARCH_STARCODER to llama.cpp * set head_count_kv = 1 * load starcoder weight * add max_position_embeddings * set n_positions to max_positioin_embeddings * properly load all starcoder params * fix head count kv * fix comments * fix vram calculation for starcoder * store mqa directly * add input embeddings handling * add TBD * working in cpu, metal buggy * cleanup useless code * metal : fix out-of-bounds access in soft_max kernels * llama : make starcoder graph build more consistent with others * refactor: cleanup comments a bit * add other starcoder models: 3B, 7B, 15B * support-mqa-directly * fix: remove max_position_embeddings, use n_train_ctx * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix: switch to space from tab --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'examples/server')
0 files changed, 0 insertions, 0 deletions