summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-26Add OpenCL add kernel (#5151)0cc4m
* Add OpenCL add kernel * Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results
2024-01-26cmake : pass CPU architecture flags to nvcc (#5146)Jared Van Bortel
2024-01-26cuda : fix tensor size calculation for non-split buffer (#5145)slaren
2024-01-26ggml-alloc : add 10% margin to the buffer sizes (#5149)slaren
2024-01-26ggml : update softmax n_task calculation (#5126)snadampal
updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
2024-01-26scripts : move run-with-preset.py from root to scripts folderGeorgi Gerganov
2024-01-26tests : gitignore test-c.oGeorgi Gerganov
2024-01-26server : refactored the task processing logic (#5065)Xuan Son Nguyen
* server: add llama_server_queue struct * server: add llama_server_response_event * server: add comments * server: move all mutexes away from server.cpp * server: correct multitask response * server: only add back deferred tasks when one slot is available * server: fix a race condition cause by "request_completion"
2024-01-26ci : add model tests + script wrapper (#4586)crasm
* scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh
2024-01-26metal : remove unused `n_buffers` and `buffers` (#5129)Paul Tsochantaris
2024-01-26gguf : fix "general.alignment" type in gguf_reader.py (#5136)Riceball LEE
2024-01-26readme : update hot topicsGeorgi Gerganov
2024-01-26Another bucket sort (#5109)Kawrakow
* Initial bucket sort * Bucket sort: slightly better version * Bucket sort: another minor improvement --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25readme : add MobileVLM 1.7B/3B to the supported models list (#5107)XiaotaoChen
Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-25llama : dynamic temperature sampling (#4972)l3utterfly
* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-25examples : make pydantic scripts pass mypy and support py3.8 (#5099)Jared Van Bortel
2024-01-25android : use release cmake build type by default (#5123)Valentin Konovalov
2024-01-25Fix Q3_K_XS for MoE models (#5113)Kawrakow
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25metal : show compile log messagesGeorgi Gerganov
2024-01-24cuda : fix 2-bit quants on amd hip (#5105)Engininja2
* cuda : fix 2-bit quants on amd hip * use __low2float intrinsic function for new quants
2024-01-24nix-shell: use addToSearchPathMichael Hueschen
thx to @SomeoneSerge for the suggestion!
2024-01-24nix: add cc to devShell LD_LIBRARY_PATHMichael Hueschen
this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...
2024-01-24llama : pre-allocate input tensors in a separate buffer (#5100)slaren
2024-01-23metal : disable support for MUL_MAT F32 x F16Georgi Gerganov
2024-01-23Additional KL-divergence statistics (#5081)Kawrakow
* perplexity: add top-token probability * perplexity: add additional KL-divergence statistics * perplexity: a better organized KL-divergence statistics output --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-23CUDA: more info when no device code (#5088)Johannes Gäßler
2024-01-23minor : clean-up some warnings and style (#5094)Georgi Gerganov
* minor : clean-up some warnings and style ggml-ci * ggml : add comment
2024-01-23devops : add intel oneapi dockerfile (#5068)Xuan Son Nguyen
Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
2024-01-23llama.vim : added api key support (#5090)Michael Coppola
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2024-01-22llama : fix not enough space in buffer with Qwen (#5086)slaren
2024-01-22KL-divergence (#5076)Kawrakow
* kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22ggml : parallelize FP32 conversion when using BLAS (#5045)Reinforce-II
* make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-22llava : MobileVLM support (#4954)XiaotaoChen
* MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22flake.nix: add a comment about flakes vs nixSomeone Serge
2024-01-22nix: add a comment on the many nixpkgs-with-cuda instancesSomeone Serge
2024-01-22nix: add a comment about makeScopeSomeone Serge
2024-01-22nix: refactor the cleanSource rulesSomeone Serge
2024-01-22workflows: nix-ci: drop the redundant "paths" filterSomeone Serge
2024-01-22workflows: nix-build-aarch64: rate limitSomeone Serge
2024-01-22workflows: nix-ci: rebuild on flake.lock updatesSomeone Serge
2024-01-22imatrix : keep intermediate imatrix results (#5077)Kawrakow
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22llama : support StableLM 2 1.6B (#5052)compilade
* llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.
2024-01-22finetune : print sample-start/include-sample-start (#5072)Daniel Bevenius
This commit adds `--sample-start` and `--include-sample-start` to the output from the main function in finetune.cpp. The motivation for this is that even though these are set explicitly by the user via the command line, if one forgets to set them then it is useful to have their values printed out. Otherwise it is possible to go through the whole training process before realizing that the values are not what one expected. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-22llama : add Q3_K_XS (#5060)Kawrakow
* Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S * Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K Together with an importance matrix, this brings perplexity for LLaMA-v2-70B below the perplexity of the former Q2_K with a 800 MB smaller quantized model size. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-22ci : fix Windows CI by updating Intel SDE version (#5053)bobqianic
2024-01-22llama : add more qwen2 models (#5071)Shijie
2024-01-21Revert LLAMA_NATIVE to OFF in flake.nix (#5066)iSma
2024-01-21add safetensors support to convert-lora-to-ggml.py (#5062)kuronekosaiko
* add safetensors support to convert-lora-to-ggml.py * Update convert-lora-to-ggml.py Remove white space in line 69.
2024-01-21add `#include <string>` to unicode.h (#5051)bobqianic
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-01-21Add ability to evauate multiple choice tasks (#5047)Kawrakow
* TruthfulQA: 1st attempt, does not look like it is working The same implementation can be used for HellaSwag as well, so I converted a HellaSwag validation dataset to the binary format used here and tested with that. The score is only around 50, so something is not quite right. * TruthfulQA: works but the result is bad I know it works because if I convert the HellaSwag validation data to the binary format used in the truthful_qa_score() function I get the exact same result as from the hellaswag_score() function. But I guess, the questions are tricky and the way I have done the combination of question + answer is very likely not the best. The TruthfulQA validation dataset contains 817 questions, with random chance result around 19%. With this version I get 29.1% for Mistral-7B and 55.2% for Mistral-7B-Instruct-v0.2. The HF leader board results for these two models are 42.2% and 68.3%, respectively. * TruthfulQA: fix random sample * TruthfulQA: prepare tasks in parallel for large test datasets * Rename truthful_qa to multiple_choice * Make MSVC happy I had forgotten that MSVC does not make constexpr's available inside a lambda. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>