ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-12-21	ggml : change ggml_scale to take a float instead of tensor (#4573)	Georgi Gerganov
	* ggml : change ggml_scale to take a float instead of tensor * ggml : fix CPU implementation * tests : fix test-grad0 ggml-ci
2023-12-21	gguf-py : fix broken link	Georgi Gerganov

2023-12-21	gguf : simplify example dependencies	Georgi Gerganov

2023-12-21	ci : add `jlumbroso/free-disk-space` to docker workflow (#4150)	Samuel Maynard
	* [github][workflows][docker]: removes hardcoded `ggerganov` from `ghcr` repo * [github][workflows][docker]: adds `jlumbroso/free-disk-space`
2023-12-21	llama : initial ggml-backend integration (#4520)	slaren
	* llama : initial ggml-backend integration * add ggml-metal * cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_CUDA_TEST access all tensor data with ggml_backend_tensor_get/set * add ggml_backend_buffer_clear zero-init KV cache buffer * add ggml_backend_buffer_is_hos, used to avoid copies if possible when accesing tensor data * disable gpu backends with ngl 0 * more accurate mlock * unmap offloaded part of the model * use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap * update quantize and lora * update session copy/set to use ggml-backend ggml-ci * use posix_fadvise instead of posix_fadvise64 * ggml_backend_alloc_ctx_tensors_from_buft : remove old print * llama_mmap::align_offset : use pointers instead of references for out parameters * restore progress_callback behavior * move final progress_callback call to load_all_data * cuda : fix fprintf format string (minor) * do not offload scales * llama_mmap : avoid unmapping the same fragments again in the destructor * remove unnecessary unmap * metal : add default log function that prints to stderr, cleanup code ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21	llama : allow getting n_batch from llama_context in c api (#4540)	Marcus Dunn
	* allowed getting n_batch from llama_context in c api * changed to use `uint32_t` instead of `int` * changed to use `uint32_t` instead of `int` in `llama_n_ctx` * Update llama.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21	metal : fix `ggml_metal_log` vargs (#4373)	Finn Voorhees

2023-12-21	cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)	Erik Garrison
	* AMD ROCm: handle UMA memory VRAM expansions This resolves #2797 by allowing ROCm AMD GPU users with a UMA to dynamically expand the VRAM allocated to the GPU. Without this, AMD ROCm users with shared CPU/GPU memory usually are stuck with the BIOS-set (or fixed) framebuffer VRAM, making it impossible to load more than 1-2 layers. Note that the model is duplicated in RAM because it's loaded once for the CPU and then copied into a second set of allocations that are managed by the HIP UMA system. We can fix this later. * clarify build process for ROCm on linux with cmake * avoid using deprecated ROCm hipMallocHost * keep simplifying the change required for UMA * cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON
2023-12-21	ggml-cuda: Fix HIP build by adding define for __trap (#4569)	arlo-phoenix
	Regression of 139882392258671ffe5acdfcadc0bc08572d6eef HIP doesn't have trap, only abort
2023-12-21	common : remove incorrect --model-draft default (#4568)	Jared Van Bortel

2023-12-21	CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)	Johannes Gäßler

2023-12-21	readme : update coding guidelines	Georgi Gerganov

2023-12-21	py : open merges file as 'utf-8' (#4566)	howlger
	Otherwise, on Windows converting bling-phi-2-v0 (<https://huggingface.co/llmware/bling-phi-2-v0>) via convert-hf-to-gguf.py will fail with the following error: ``` Traceback (most recent call last): File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 1061, in <module> model_instance.set_vocab() File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 52, in set_vocab self._set_vocab_gpt2() File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 264, in _set_vocab_gpt2 special_vocab = gguf.SpecialVocab(dir_model, load_merges=True) File "C:\Users\User\git\gguf\gguf\vocab.py", line 33, in __init__ self._load(Path(path)) File "C:\Users\User\git\gguf\gguf\vocab.py", line 81, in _load self._try_load_merges_txt(path) File "C:\Users\User\git\gguf\gguf\vocab.py", line 95, in _try_load_merges_txt for line in fp: File "C:\Users\User\miniconda3\envs\gguf\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1415: character maps to <undefined> ```
2023-12-21	cuda : better error message for ggml_get_rows (#4561)	bobqianic
	* Update ggml-cuda.cu * Update ggml-cuda.cu * Update ggml-cuda.cu --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21	cuda : replace asserts in wrong architecture checks with __trap (#4556)	slaren
	* cuda : replace asserts in wrong architecture checks with __trap * make bad_arch noreturn, remove returns
2023-12-21	llama : disable per-tensor info prints on model load (#4562)	Johannes Gäßler

2023-12-21	Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554)	LoganDark

2023-12-20	CUDA: Faster Mixtral prompt processing (#4538)	Johannes Gäßler
	* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2023-12-19	ggml : fixed check for _MSC_VER (#4535)	Eric Sommerlade
	Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-18	ggml-cuda: Fix HIP build (#4528)	arlo-phoenix
	regression of #4490 Adds defines for two new datatypes cublasComputeType_t, cudaDataType_t. Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18	llama.swiftui : add tinyllama 1.1B F16	Georgi Gerganov

2023-12-18	llama.swiftui : add more models	Georgi Gerganov

2023-12-18	llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)	Ebey Abraham
	* phi2 implementation * fix breaking change * phi-2 : various fixes * phi-2 : use layer norm eps * py : whitespaces * llama : fix meta KV override bug * convert : phi don't add BOS token * convert : revert "added_tokens_decoder" change * phi-2 : scale Q instead of KQ for better precision * ggml : fix NeoX rope to rotate just first n_dims * cuda : less diff in the rope_neox kernel * ggml : add ggml_mul_mat_set_prec ggml-ci * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * cuda : ggml_cuda_op_mul_mat_cublas support F32 precision * cuda : remove oboslete comment --------- Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18	llama : fix try_override for bool_value which always return true (#4519)	hankcs

2023-12-17	decode : fix logits_valid for legacy API (#4516)	Jared Van Bortel

2023-12-17	readme : update hot topics	Georgi Gerganov

2023-12-17	llama.swiftui : add bench functionality (#4483)	Georgi Gerganov
	* llama.swiftui : add bench button * llama.swiftui : initial bench functionality * force to use n_gpu_layers on simulator * add download buttons & expose llamaState.loadModel * update project.pbxproj * comment #Preview & fix editorconfig check * gitignore : xcode stuff * llama.swiftui : UX improvements * llama.swiftui : avoid data copy via "downloadTask" * llama.swiftui : remove model from project * llama : remove "mostly" from model infos * llama.swiftui : improve bench --------- Co-authored-by: jhen <developer@jhen.me>
2023-12-17	gguf-py : fail fast on nonsensical special token IDs (#4489)	Jared Van Bortel

2023-12-17	build : Check the ROCm installation location (#4485)	Matheus Gabriel Alves Silva
	* build : Check the ROCm installation location * more generic approach * fixup! It was returning the path instead of the command output * fixup! Trailing whitespace
2023-12-17	finetune : keep allocs alive until all allocations are done (#4486)	slaren

2023-12-17	server : disable llm logs if SERVER_VERBOSE is off (#3792)	olexiyb

2023-12-17	server : fix grammar being ignored (#4494)	AdithyanI
	Fix bug in identifying the grammar.
2023-12-17	server : fix possible ambiguity in content type charset (#4501)	Alexey Parfenov

2023-12-17	server : allow requests larger than 8K (#4500)	mzcu

2023-12-17	Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506)	Bach Le

2023-12-16	lora : add support for non-llama models (#3333)	slaren
	* lora : add support for non-llama models ggml-ci * avoid leaking ggml_context on failure cleanup ggml-ci * lora : allow 1d tensors * lora : include embd and output layers in size calculation * fix style
2023-12-15	llama : sanity checks for access to logits (#4274)	Jared Van Bortel
	Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15	server : add optional API Key Authentication example (#4441)	ShadovvBeast
	* Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-15	ggml : group mul_mat_id rows by matrix (cpu only) (#4480)	slaren
	* ggml : group mul_mat_id rows by matrix (cpu only) * remove mmid parameters from mm forward * store row groups in wdata and calculate only once in GGML_TASK_INIT ggml-ci
2023-12-14	ggml : use ggml_row_size where possible (#4472)	slaren
	* ggml : use ggml_row_size where possible ggml-ci * ggml : move ggml_nbytes_split to ggml-cuda.cu
2023-12-14	ggml : remove n_dims from ggml_tensor (#4469)	slaren
	ggml-ci
2023-12-14	py : add protobuf dependency (#4466)	wonjun Jang

2023-12-14	ggml : add ggml_row_size() (fixes llama out of space) (#4461)	LostRuins
	* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values * do not cast to size_t, instead just use doubles * ggml : add ggml_row_size(), deprecate ggml_type_sizef() * ggml : fix row size compute to avoid overflows * tests : fix sizey -> sizez --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14	ggml : fix OpenCL broadcast requirement for ggml_mul (close #4453)	Georgi Gerganov

2023-12-14	convert : support loading vocab from fast tokenizer config (#3633)	wonjun Jang
	* Add HFVocab into convert.py * Update convert.py * Update convert.py * add bytes_to_unicode function * change add_meta_vocab fucntion * remove debug code * remove byte_encoder * Add newline between classes * Check tokenizer.json when tokenizer.model is not exist. * Move transformers dependency to local code * Add error context with 'raise from' * Add fast tokenizer option to BpeVocab * Update convert.py * Add VocabLoader and remove Vocab class Add transformers dependency * remove added tokens and check newline token to decide spm or bpe * Update convert.py * Add special token type * Update convert.py * Update convert.py * Update convert.py * Fix typo in convert.py * Fix when params.n_vocab < tokenizer vocab size * update vocab class * change funtion name * Remove unused variable/functions, add types to class variable and methods, delete blank liens * fix flake8 warnings * code style cleanup * make mypy happy * change exception --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-12-14	readme : update supported model list (#4457)	BarfingLemurs

2023-12-13	server : fix handling of characters that span multiple tokens when streaming ↵	shibe2
	(#4446)
2023-12-13	sync : ggml (SD ops, tests, kernels) (#4444)	Georgi Gerganov
	* sync : ggml (SD ops, tests, kernels) ggml-ci * cuda : restore im2col ggml-ci * metal : fix accuracy of dequantization kernels ggml-ci * cuda : restore correct im2col ggml-ci * metal : try to fix moe test by reducing expert size ggml-ci * cuda : fix bin bcast when src1 and dst have different types ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>
2023-12-13	build : detect host compiler and cuda compiler separately (#4414)	Jared Van Bortel

2023-12-13	common : add `--version` option to show build info in CLI (#4433)	Siwen Yu