summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-28flake.lock: Update (#5162)Georgi Gerganov
2024-01-28Apply min_p to unsorted tokens (#5115)Johannes Gäßler
2024-01-28Tests for min_p, sampling queue (#5147)Johannes Gäßler
2024-01-28readme : add link to rust bindings (#5148)Marcus Dunn
* added link to another set of rust bindings with brief note on differences. * fixed link name
2024-01-28llama : add support for Orion-14B (#5118)sharpHL
* add support for Orion-14B(https://huggingface.co/OrionStarAI/Orion-14B-Chat) * flake8 support * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update llama.cpp * Update llama.cpp --------- Co-authored-by: lixiaopu <lixiaopu@cmcm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
2024-01-28docker : add server-first container images (#5157)Kyle Mistele
* feat: add Dockerfiles for each platform that user ./server instead of ./main * feat: update .github/workflows/docker.yml to build server-first docker containers * doc: add information about running the server with Docker to README.md * doc: add information about running with docker to the server README * doc: update n-gpu-layers to show correct GPU usage * fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA
2024-01-27llava : support for Yi-VL and fix for mobileVLM (#5093)John
* Support for Yi-VL, templating fix for mobileVLM * ws * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llava-cli.cpp * Update clip.cpp bugfix for new conversions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-27sync : ggmlGeorgi Gerganov
2024-01-27ggml : check ggml_add src1 type (ggml/708)Judd
Co-authored-by: Judd <foldl@boxvest.com>
2024-01-27Remove unused data and add fixes (#5154)Michael Klimenko
* Remove unused data and add fixes * Add missing file * Address review comments * Replace the scope of vq allocation
2024-01-27server : add self-extend support (#5104)Maximilian Winter
* Ported self extension to server example * Update server.cpp * Fixed prompt caching without self extend * Update server.cpp * Added description to server readme. * Update server.cpp * Update server.cpp * Update server.cpp * Update server.cpp * Update README.md * Changed descriptions * server : formatting * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update server.cpp * Update server.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-26Add OpenCL add kernel (#5151)0cc4m
* Add OpenCL add kernel * Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results
2024-01-26cmake : pass CPU architecture flags to nvcc (#5146)Jared Van Bortel
2024-01-26cuda : fix tensor size calculation for non-split buffer (#5145)slaren
2024-01-26ggml-alloc : add 10% margin to the buffer sizes (#5149)slaren
2024-01-26ggml : update softmax n_task calculation (#5126)snadampal
updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
2024-01-26scripts : move run-with-preset.py from root to scripts folderGeorgi Gerganov
2024-01-26tests : gitignore test-c.oGeorgi Gerganov
2024-01-26server : refactored the task processing logic (#5065)Xuan Son Nguyen
* server: add llama_server_queue struct * server: add llama_server_response_event * server: add comments * server: move all mutexes away from server.cpp * server: correct multitask response * server: only add back deferred tasks when one slot is available * server: fix a race condition cause by "request_completion"
2024-01-26ci : add model tests + script wrapper (#4586)crasm
* scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh
2024-01-26metal : remove unused `n_buffers` and `buffers` (#5129)Paul Tsochantaris
2024-01-26gguf : fix "general.alignment" type in gguf_reader.py (#5136)Riceball LEE
2024-01-26readme : update hot topicsGeorgi Gerganov
2024-01-26Another bucket sort (#5109)Kawrakow
* Initial bucket sort * Bucket sort: slightly better version * Bucket sort: another minor improvement --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25readme : add MobileVLM 1.7B/3B to the supported models list (#5107)XiaotaoChen
Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-25llama : dynamic temperature sampling (#4972)l3utterfly
* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-25examples : make pydantic scripts pass mypy and support py3.8 (#5099)Jared Van Bortel
2024-01-25android : use release cmake build type by default (#5123)Valentin Konovalov
2024-01-25Fix Q3_K_XS for MoE models (#5113)Kawrakow
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25metal : show compile log messagesGeorgi Gerganov
2024-01-24cuda : fix 2-bit quants on amd hip (#5105)Engininja2
* cuda : fix 2-bit quants on amd hip * use __low2float intrinsic function for new quants
2024-01-24nix-shell: use addToSearchPathMichael Hueschen
thx to @SomeoneSerge for the suggestion!
2024-01-24nix: add cc to devShell LD_LIBRARY_PATHMichael Hueschen
this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...
2024-01-24llama : pre-allocate input tensors in a separate buffer (#5100)slaren
2024-01-23metal : disable support for MUL_MAT F32 x F16Georgi Gerganov
2024-01-23Additional KL-divergence statistics (#5081)Kawrakow
* perplexity: add top-token probability * perplexity: add additional KL-divergence statistics * perplexity: a better organized KL-divergence statistics output --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-23CUDA: more info when no device code (#5088)Johannes Gäßler
2024-01-23minor : clean-up some warnings and style (#5094)Georgi Gerganov
* minor : clean-up some warnings and style ggml-ci * ggml : add comment
2024-01-23devops : add intel oneapi dockerfile (#5068)Xuan Son Nguyen
Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
2024-01-23llama.vim : added api key support (#5090)Michael Coppola
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2024-01-22llama : fix not enough space in buffer with Qwen (#5086)slaren
2024-01-22KL-divergence (#5076)Kawrakow
* kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22ggml : parallelize FP32 conversion when using BLAS (#5045)Reinforce-II
* make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-22llava : MobileVLM support (#4954)XiaotaoChen
* MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22flake.nix: add a comment about flakes vs nixSomeone Serge
2024-01-22nix: add a comment on the many nixpkgs-with-cuda instancesSomeone Serge
2024-01-22nix: add a comment about makeScopeSomeone Serge
2024-01-22nix: refactor the cleanSource rulesSomeone Serge
2024-01-22workflows: nix-ci: drop the redundant "paths" filterSomeone Serge
2024-01-22workflows: nix-build-aarch64: rate limitSomeone Serge