llama : add Command R Plus support (#6491)

* Add Command R Plus GGUF * Add Command R Plus GGUF * Loading works up to LayerNorm2D * Export new tensors in 1D so they are not quantized. * Fix embedding layer based on Noeda's example * Whitespace * Add line * Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda) * dranger003: Fix block index overflow in CUDA dequantizing. * Reverted blocked multiplication code as it still has issues and could affect other Llama arches * export norms as f32 * fix overflow issues during quant and other cleanup * Type convention Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * dranger003: Fix more int overflow during quant. --------- Co-authored-by: S <seast@Ss-Mac-Studio.local> Co-authored-by: S <s@example.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Carolinabanana <140120812+Carolinabanana@users.noreply.github.com> 2024-04-09 09:16:13 +0100
committer: GitHub <noreply@github.com> 2024-04-09 11:16:13 +0300
commit: 5dc9dd7152dedc6046b646855585bd070c91e8c8 (patch)
tree: d2bae3652d91cdd9327e28fa85d167a67e050c53 /ggml-cuda/convert.cuh
parent: e11a8999b5690f810c2c99c14347f0834e68c524 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/ggml-cuda/convert.cuh b/ggml-cuda/convert.cuh
index db34c0be..5394be9f 100644
--- a/ggml-cuda/convert.cuh
+++ b/ggml-cuda/convert.cuh
@@ -3,7 +3,7 @@
 #define CUDA_DEQUANTIZE_BLOCK_SIZE 256
 
 template<typename T>
-using to_t_cuda_t = void (*)(const void * __restrict__ x, T * __restrict__ y, int k, cudaStream_t stream);
+using to_t_cuda_t = void (*)(const void * __restrict__ x, T * __restrict__ y, int64_t k, cudaStream_t stream);
 
 typedef to_t_cuda_t<float> to_fp32_cuda_t;
 typedef to_t_cuda_t<half> to_fp16_cuda_t;
author	Carolinabanana <140120812+Carolinabanana@users.noreply.github.com>	2024-04-09 09:16:13 +0100
committer	GitHub <noreply@github.com>	2024-04-09 11:16:13 +0300
commit	5dc9dd7152dedc6046b646855585bd070c91e8c8 (patch)
tree	d2bae3652d91cdd9327e28fa85d167a67e050c53 /ggml-cuda/convert.cuh
parent	e11a8999b5690f810c2c99c14347f0834e68c524 (diff)