ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-01-28	flake.lock: Update (#5162)	Georgi Gerganov

2024-01-28	Apply min_p to unsorted tokens (#5115)	Johannes Gäßler

2024-01-28	Tests for min_p, sampling queue (#5147)	Johannes Gäßler

2024-01-28	readme : add link to rust bindings (#5148)	Marcus Dunn
	* added link to another set of rust bindings with brief note on differences. * fixed link name
2024-01-28	llama : add support for Orion-14B (#5118)	sharpHL
	* add support for Orion-14B(https://huggingface.co/OrionStarAI/Orion-14B-Chat) * flake8 support * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update llama.cpp * Update llama.cpp --------- Co-authored-by: lixiaopu <lixiaopu@cmcm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
2024-01-28	docker : add server-first container images (#5157)	Kyle Mistele
	* feat: add Dockerfiles for each platform that user ./server instead of ./main * feat: update .github/workflows/docker.yml to build server-first docker containers * doc: add information about running the server with Docker to README.md * doc: add information about running with docker to the server README * doc: update n-gpu-layers to show correct GPU usage * fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA
2024-01-27	llava : support for Yi-VL and fix for mobileVLM (#5093)	John
	* Support for Yi-VL, templating fix for mobileVLM * ws * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llava-cli.cpp * Update clip.cpp bugfix for new conversions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-27	sync : ggml	Georgi Gerganov

2024-01-27	ggml : check ggml_add src1 type (ggml/708)	Judd
	Co-authored-by: Judd <foldl@boxvest.com>
2024-01-27	Remove unused data and add fixes (#5154)	Michael Klimenko
	* Remove unused data and add fixes * Add missing file * Address review comments * Replace the scope of vq allocation
2024-01-27	server : add self-extend support (#5104)	Maximilian Winter
	* Ported self extension to server example * Update server.cpp * Fixed prompt caching without self extend * Update server.cpp * Added description to server readme. * Update server.cpp * Update server.cpp * Update server.cpp * Update server.cpp * Update README.md * Changed descriptions * server : formatting * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update server.cpp * Update server.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-26	Add OpenCL add kernel (#5151)	0cc4m
	* Add OpenCL add kernel * Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results
2024-01-26	cmake : pass CPU architecture flags to nvcc (#5146)	Jared Van Bortel

2024-01-26	cuda : fix tensor size calculation for non-split buffer (#5145)	slaren

2024-01-26	ggml-alloc : add 10% margin to the buffer sizes (#5149)	slaren

2024-01-26	ggml : update softmax n_task calculation (#5126)	snadampal
	updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
2024-01-26	scripts : move run-with-preset.py from root to scripts folder	Georgi Gerganov

2024-01-26	tests : gitignore test-c.o	Georgi Gerganov

2024-01-26	server : refactored the task processing logic (#5065)	Xuan Son Nguyen
	* server: add llama_server_queue struct * server: add llama_server_response_event * server: add comments * server: move all mutexes away from server.cpp * server: correct multitask response * server: only add back deferred tasks when one slot is available * server: fix a race condition cause by "request_completion"
2024-01-26	ci : add model tests + script wrapper (#4586)	crasm
	* scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh
2024-01-26	metal : remove unused `n_buffers` and `buffers` (#5129)	Paul Tsochantaris

2024-01-26	gguf : fix "general.alignment" type in gguf_reader.py (#5136)	Riceball LEE

2024-01-26	readme : update hot topics	Georgi Gerganov

2024-01-26	Another bucket sort (#5109)	Kawrakow
	* Initial bucket sort * Bucket sort: slightly better version * Bucket sort: another minor improvement --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25	readme : add MobileVLM 1.7B/3B to the supported models list (#5107)	XiaotaoChen
	Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-25	llama : dynamic temperature sampling (#4972)	l3utterfly
	* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-25	examples : make pydantic scripts pass mypy and support py3.8 (#5099)	Jared Van Bortel

2024-01-25	android : use release cmake build type by default (#5123)	Valentin Konovalov

2024-01-25	Fix Q3_K_XS for MoE models (#5113)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25	metal : show compile log messages	Georgi Gerganov

2024-01-24	cuda : fix 2-bit quants on amd hip (#5105)	Engininja2
	* cuda : fix 2-bit quants on amd hip * use __low2float intrinsic function for new quants
2024-01-24	nix-shell: use addToSearchPath	Michael Hueschen
	thx to @SomeoneSerge for the suggestion!
2024-01-24	nix: add cc to devShell LD_LIBRARY_PATH	Michael Hueschen
	this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...
2024-01-24	llama : pre-allocate input tensors in a separate buffer (#5100)	slaren

2024-01-23	metal : disable support for MUL_MAT F32 x F16	Georgi Gerganov

2024-01-23	Additional KL-divergence statistics (#5081)	Kawrakow
	* perplexity: add top-token probability * perplexity: add additional KL-divergence statistics * perplexity: a better organized KL-divergence statistics output --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-23	CUDA: more info when no device code (#5088)	Johannes Gäßler

2024-01-23	minor : clean-up some warnings and style (#5094)	Georgi Gerganov
	* minor : clean-up some warnings and style ggml-ci * ggml : add comment
2024-01-23	devops : add intel oneapi dockerfile (#5068)	Xuan Son Nguyen
	Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
2024-01-23	llama.vim : added api key support (#5090)	Michael Coppola
	Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2024-01-22	llama : fix not enough space in buffer with Qwen (#5086)	slaren

2024-01-22	KL-divergence (#5076)	Kawrakow
	* kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22	ggml : parallelize FP32 conversion when using BLAS (#5045)	Reinforce-II
	* make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-22	llava : MobileVLM support (#4954)	XiaotaoChen
	* MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22	flake.nix: add a comment about flakes vs nix	Someone Serge

2024-01-22	nix: add a comment on the many nixpkgs-with-cuda instances	Someone Serge

2024-01-22	nix: add a comment about makeScope	Someone Serge

2024-01-22	nix: refactor the cleanSource rules	Someone Serge

2024-01-22	workflows: nix-ci: drop the redundant "paths" filter	Someone Serge

2024-01-22	workflows: nix-build-aarch64: rate limit	Someone Serge