summaryrefslogtreecommitdiff
path: root/ggml-alloc.c
AgeCommit message (Collapse)Author
2023-08-25ggml-alloc : enlarge size of parse_seq (#2776)Shouzheng Liu
Since we also store barriers in this array, we need to double its size.
2023-08-24metal : bug-fix when enable ggml-alloc (#2757)Shouzheng Liu
* metal: better memory alloc w/ concurrency dispatch The ggml-alloc should only free tensors at memory barriers. * ggml-alloc: avoid return silently In certain cases, the allocate_node() function may silently return without performing any memory allocation.
2023-08-23llm : add Falcon support (#2717)Georgi Gerganov
* llama : refactor GGUF constants into static maps * llama : check if model architecture is known * llama : refactor llama_model_load_internal() * gguf : add KV constant maps * llm : read arch-specific KVs * convert : add dummy scores + types * falcon : load tensor data (CPU only) * llama : fix loading progress bar * llama : add arch member to llama_model * falcon : CPU inference working * falcon : support non-40B models * falcon : minor * llama : minor updates ggml-ci * convert-falcon-hf-to-gguf.py : fix special token mapping * llama.cpp : llama default UNK token = id 0 * llama.cpp : fix bpe tokenizer * llama.cpp : fix the fix of bpe tokenizer * ggml : pass eps to ggml_norm * metal : implement RoPE (mode = 2) + avoid ggml_repeat * ggml : ggml_repeat always creates new tensor * falcon : copy-paste self-attention from LLaMA * metal : print extra compute pipeline info * falcon : minor changes (still chasing the Metal problem) * llama.cpp : fix linefeed token * metal : fix GELU kernel numerical stability by using precise::tanh * metal : temporary workaround for the concurrency optimization bug * falcon : add CUDA offloading (#2739) * llama : better model naming and size reporting * llama : prep new tokenizer support * llama : advanced BPE tokenizer based on ggllm.cpp imlpementation * llama : remove oboslete comment ggml-ci * common : remove obsolete BPE API + disable test-tokenizer-1 * llama : revert BPE special-case in llama_byte_to_token() * cuda : add TODOs for RoPE NeoX implementation * llama : default special tokens based on vocab type * perplexity : add log for start of tokenization --------- Co-authored-by: klosax <131523366+klosax@users.noreply.github.com> Co-authored-by: slaren <slarengh@gmail.com>
2023-08-22ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709)Georgi Gerganov
* ggml : sync latest (SAM + SD operators, CUDA alibi) ggml-ci * ggml : fix tabs
2023-08-17ggml-alloc : fix discrepency between measure&eval (#2639)Shouzheng Liu
The GGML memory allocator consistently places a tensor within the optimal-fit memory block, which is the smallest block capable of accommodating the tensor's size. During the measurement phase, the final block is generously sized, ensuring it never qualifies as the optimal-fit block as long as there exists another block capable of accommodating the tensor. Nevertheless, in the evaluation phase, the last block is constrained in size and could potentially qualify as the optimal-fit block. Consequently, there exists the possibility of a tensor being allocated to a different region during evaluation, leading to more memory fragmentation in our scratch buffer. This recent commit guarantees uniform behavior of the allocator across both the measurement and evaluation phases, eliminating discrepancies between the two.
2023-08-16metal : enable ggml-alloc (#2627)Shouzheng Liu
* metal: enable ggml-alloc Make ggml-alloc work with concurrently dispatch. * style-fix Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-09ggml-alloc: Don't try to re-use buffers of external tensors (#2562)Sam Spilsbury
* ggml-alloc: Don't try to re-use buffers of external tensors They might be weights that came from another context, so we have no control over them (and they might be re-used elsewhere so writing to them would be a bad idea). * ggml-alloc: >= when checking for out-of-bounds Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2023-07-30ggml : add graph tensor allocator (#2411)slaren
* ggml : add graph tensor allocator * ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset * ggml : refactor ggml_view_Nd into ggml_view_tensor_offset