refact : fix convert script + zero out KV cache to avoid nans (#3523)

* refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements
author: Georgi Gerganov <ggerganov@gmail.com> 2023-10-09 14:32:17 +0300
committer: GitHub <noreply@github.com> 2023-10-09 14:32:17 +0300
commit: fcca0a700487999d52a525c96d6661e9f6a8703a (patch)
tree: edf07ca2f40aa95e40b5f6863322ea0293467592 /examples/parallel/parallel.cpp
parent: dcc09d25961c5d0626bc148e558ee841141748f7 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/examples/parallel/parallel.cpp b/examples/parallel/parallel.cpp
index 721888da..04f1e45b 100644
--- a/examples/parallel/parallel.cpp
+++ b/examples/parallel/parallel.cpp
@@ -167,7 +167,7 @@ int main(int argc, char ** argv) {
 
     // the max batch size is as large as the context to handle cases where we get very long input prompt from multiple
     // users. regardless of the size, the main loop will chunk the batch into a maximum of params.n_batch tokens at a time
-    llama_batch batch = llama_batch_init(params.n_ctx, 0);
+    llama_batch batch = llama_batch_init(n_ctx, 0);
 
     int32_t n_total_prompt = 0;
     int32_t n_total_gen    = 0;
author	Georgi Gerganov <ggerganov@gmail.com>	2023-10-09 14:32:17 +0300
committer	GitHub <noreply@github.com>	2023-10-09 14:32:17 +0300
commit	fcca0a700487999d52a525c96d6661e9f6a8703a (patch)
tree	edf07ca2f40aa95e40b5f6863322ea0293467592 /examples/parallel/parallel.cpp
parent	dcc09d25961c5d0626bc148e558ee841141748f7 (diff)