ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-12-23	ci(docker): fix tags in "Build and push docker image (tagged)" (#4603)	Samuel Maynard

2023-12-23	server : allow to specify custom prompt for penalty calculation (#3727)	Alexey Parfenov

2023-12-23	grammar : check the full vocab only if necessary (opt) (#4306)	kalomaze
	* Check the full vocab for grammar only if necessary * Fix missing logit restoration step (?) Does this matter, actually? * Fix whitespace / formatting * Adjust comment * Didn't mean to push test gbnf * Split sampling into the helper function (?) And also revert the changes made to the header * common : fix final newline --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-23	CUDA: fixed row rounding for 0 tensor splits (#4594)	Johannes Gäßler

2023-12-22	lookup : add prompt lookup decoding example (#4484)	LeonEricsson
	* initial commit, going through initializations * main loop finished, starting to debug * BUG: generates gibberish/repeating tokens after a while * kv_cache management * Added colors to distinguish drafted tokens (--color). Updated README * lookup : fix token positions in the draft batch * lookup : use n_draft from CLI params * lookup : final touches --------- Co-authored-by: Leon Ericsson <leon.ericsson@icloud.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-22	sync : ggml (fix im2col) (#4591)	Georgi Gerganov
	* cuda : fix im2col_f32_f16 (ggml/#658) ggml-ci * ggml-alloc : fix ggml_tallocr_is_own --------- Co-authored-by: leejet <leejet714@gmail.com>
2023-12-22	cuda : fix jetson compile error (#4560)	FantasyGmm
	* fix old jetson compile error * Update Makefile * update jetson detect and cuda version detect * update cuda marco define * update makefile and cuda,fix some issue * Update README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update Makefile * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-22	Fix CudaMemcpy direction (#4599)	Henrik Forstén

2023-12-22	llama : fix platforms without mmap (#4578)	slaren
	* llama : fix platforms without mmap * win32 : limit prefetch size to the file size * fix win32 error clobber, unnecessary std::string in std::runtime_error
2023-12-22	ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203)	Herman Semenov

2023-12-22	make : add LLAMA_HIP_UMA option (#4587)	Michael Kesper
	NB: LLAMA_HIP_UMA=1 (or any value) adds MK_CPPFLAG -DGGML_HIP_UMA
2023-12-22	ci : tag docker image with build number (#4584)	rhuddleston

2023-12-22	readme : add zig bindings (#4581)	Deins

2023-12-22	ggml : extend `enum ggml_log_level` with `GGML_LOG_LEVEL_DEBUG` (#4579)	bobqianic

2023-12-22	llama : add ability to cancel model loading (#4462)	crasm
	* llama : Add ability to cancel model load Updated llama_progress_callback so that if it returns false, the model loading is aborted. * llama : Add test for model load cancellation * Fix bool return in llama_model_load, remove std::ignore use * Update llama.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Fail test if model file is missing * Revert "Fail test if model file is missing" This reverts commit 32ebd525bf7e5a87ee8a3dbaab3d92ce79fbf23d. * Add test-model-load-cancel to Makefile * Revert "Revert "Fail test if model file is missing"" This reverts commit 2796953257ee5383fa7c8fe8fa8fc888c048fb0b. * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * ci : get ci/run.sh working with test-model-load-cancel * ci : restrict .github/workflows/build.yml ctest to -L main * update requirements.txt * Disable test-model-load-cancel in make * Remove venv before creation * Restructure requirements.txt Top-level now imports the specific additional requirements for each python file. Using `pip install -r requirements.txt` will fail if versions become mismatched in the per-file requirements. * Make per-python-script requirements work alone This doesn't break the main requirements.txt. * Add comment * Add convert-persimmon-to-gguf.py to new requirements.txt scheme * Add check-requirements.sh script and GitHub workflow * Remove shellcheck installation step from workflow * Add nocleanup special arg * Fix merge see: https://github.com/ggerganov/llama.cpp/pull/4462#discussion_r1434593573 * reset to upstream/master * Redo changes for cancelling model load --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-12-21	ggml : change ggml_scale to take a float instead of tensor (#4573)	Georgi Gerganov
	* ggml : change ggml_scale to take a float instead of tensor * ggml : fix CPU implementation * tests : fix test-grad0 ggml-ci
2023-12-21	gguf-py : fix broken link	Georgi Gerganov

2023-12-21	gguf : simplify example dependencies	Georgi Gerganov

2023-12-21	ci : add `jlumbroso/free-disk-space` to docker workflow (#4150)	Samuel Maynard
	* [github][workflows][docker]: removes hardcoded `ggerganov` from `ghcr` repo * [github][workflows][docker]: adds `jlumbroso/free-disk-space`
2023-12-21	llama : initial ggml-backend integration (#4520)	slaren
	* llama : initial ggml-backend integration * add ggml-metal * cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_CUDA_TEST access all tensor data with ggml_backend_tensor_get/set * add ggml_backend_buffer_clear zero-init KV cache buffer * add ggml_backend_buffer_is_hos, used to avoid copies if possible when accesing tensor data * disable gpu backends with ngl 0 * more accurate mlock * unmap offloaded part of the model * use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap * update quantize and lora * update session copy/set to use ggml-backend ggml-ci * use posix_fadvise instead of posix_fadvise64 * ggml_backend_alloc_ctx_tensors_from_buft : remove old print * llama_mmap::align_offset : use pointers instead of references for out parameters * restore progress_callback behavior * move final progress_callback call to load_all_data * cuda : fix fprintf format string (minor) * do not offload scales * llama_mmap : avoid unmapping the same fragments again in the destructor * remove unnecessary unmap * metal : add default log function that prints to stderr, cleanup code ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21	llama : allow getting n_batch from llama_context in c api (#4540)	Marcus Dunn
	* allowed getting n_batch from llama_context in c api * changed to use `uint32_t` instead of `int` * changed to use `uint32_t` instead of `int` in `llama_n_ctx` * Update llama.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21	metal : fix `ggml_metal_log` vargs (#4373)	Finn Voorhees

2023-12-21	cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)	Erik Garrison
	* AMD ROCm: handle UMA memory VRAM expansions This resolves #2797 by allowing ROCm AMD GPU users with a UMA to dynamically expand the VRAM allocated to the GPU. Without this, AMD ROCm users with shared CPU/GPU memory usually are stuck with the BIOS-set (or fixed) framebuffer VRAM, making it impossible to load more than 1-2 layers. Note that the model is duplicated in RAM because it's loaded once for the CPU and then copied into a second set of allocations that are managed by the HIP UMA system. We can fix this later. * clarify build process for ROCm on linux with cmake * avoid using deprecated ROCm hipMallocHost * keep simplifying the change required for UMA * cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON
2023-12-21	ggml-cuda: Fix HIP build by adding define for __trap (#4569)	arlo-phoenix
	Regression of 139882392258671ffe5acdfcadc0bc08572d6eef HIP doesn't have trap, only abort
2023-12-21	common : remove incorrect --model-draft default (#4568)	Jared Van Bortel

2023-12-21	CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)	Johannes Gäßler

2023-12-21	readme : update coding guidelines	Georgi Gerganov

2023-12-21	py : open merges file as 'utf-8' (#4566)	howlger
	Otherwise, on Windows converting bling-phi-2-v0 (<https://huggingface.co/llmware/bling-phi-2-v0>) via convert-hf-to-gguf.py will fail with the following error: ``` Traceback (most recent call last): File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 1061, in <module> model_instance.set_vocab() File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 52, in set_vocab self._set_vocab_gpt2() File "C:\Users\User\git\gguf\convert-hf-to-gguf.py", line 264, in _set_vocab_gpt2 special_vocab = gguf.SpecialVocab(dir_model, load_merges=True) File "C:\Users\User\git\gguf\gguf\vocab.py", line 33, in __init__ self._load(Path(path)) File "C:\Users\User\git\gguf\gguf\vocab.py", line 81, in _load self._try_load_merges_txt(path) File "C:\Users\User\git\gguf\gguf\vocab.py", line 95, in _try_load_merges_txt for line in fp: File "C:\Users\User\miniconda3\envs\gguf\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1415: character maps to <undefined> ```
2023-12-21	cuda : better error message for ggml_get_rows (#4561)	bobqianic
	* Update ggml-cuda.cu * Update ggml-cuda.cu * Update ggml-cuda.cu --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21	cuda : replace asserts in wrong architecture checks with __trap (#4556)	slaren
	* cuda : replace asserts in wrong architecture checks with __trap * make bad_arch noreturn, remove returns
2023-12-21	llama : disable per-tensor info prints on model load (#4562)	Johannes Gäßler

2023-12-21	Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554)	LoganDark

2023-12-20	CUDA: Faster Mixtral prompt processing (#4538)	Johannes Gäßler
	* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
2023-12-19	ggml : fixed check for _MSC_VER (#4535)	Eric Sommerlade
	Co-authored-by: Eric Sommerlade <ersomme@microsoft.com>
2023-12-18	ggml-cuda: Fix HIP build (#4528)	arlo-phoenix
	regression of #4490 Adds defines for two new datatypes cublasComputeType_t, cudaDataType_t. Currently using deprecated hipblasDatatype_t since newer ones very recent.
2023-12-18	llama.swiftui : add tinyllama 1.1B F16	Georgi Gerganov

2023-12-18	llama.swiftui : add more models	Georgi Gerganov

2023-12-18	llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)	Ebey Abraham
	* phi2 implementation * fix breaking change * phi-2 : various fixes * phi-2 : use layer norm eps * py : whitespaces * llama : fix meta KV override bug * convert : phi don't add BOS token * convert : revert "added_tokens_decoder" change * phi-2 : scale Q instead of KQ for better precision * ggml : fix NeoX rope to rotate just first n_dims * cuda : less diff in the rope_neox kernel * ggml : add ggml_mul_mat_set_prec ggml-ci * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> * cuda : ggml_cuda_op_mul_mat_cublas support F32 precision * cuda : remove oboslete comment --------- Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18	llama : fix try_override for bool_value which always return true (#4519)	hankcs

2023-12-17	decode : fix logits_valid for legacy API (#4516)	Jared Van Bortel

2023-12-17	readme : update hot topics	Georgi Gerganov

2023-12-17	llama.swiftui : add bench functionality (#4483)	Georgi Gerganov
	* llama.swiftui : add bench button * llama.swiftui : initial bench functionality * force to use n_gpu_layers on simulator * add download buttons & expose llamaState.loadModel * update project.pbxproj * comment #Preview & fix editorconfig check * gitignore : xcode stuff * llama.swiftui : UX improvements * llama.swiftui : avoid data copy via "downloadTask" * llama.swiftui : remove model from project * llama : remove "mostly" from model infos * llama.swiftui : improve bench --------- Co-authored-by: jhen <developer@jhen.me>
2023-12-17	gguf-py : fail fast on nonsensical special token IDs (#4489)	Jared Van Bortel

2023-12-17	build : Check the ROCm installation location (#4485)	Matheus Gabriel Alves Silva
	* build : Check the ROCm installation location * more generic approach * fixup! It was returning the path instead of the command output * fixup! Trailing whitespace
2023-12-17	finetune : keep allocs alive until all allocations are done (#4486)	slaren

2023-12-17	server : disable llm logs if SERVER_VERBOSE is off (#3792)	olexiyb

2023-12-17	server : fix grammar being ignored (#4494)	AdithyanI
	Fix bug in identifying the grammar.
2023-12-17	server : fix possible ambiguity in content type charset (#4501)	Alexey Parfenov

2023-12-17	server : allow requests larger than 8K (#4500)	mzcu

2023-12-17	Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506)	Bach Le