summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-01-26Add OpenCL add kernel (#5151)0cc4m
2024-01-26cmake : pass CPU architecture flags to nvcc (#5146)Jared Van Bortel
2024-01-26cuda : fix tensor size calculation for non-split buffer (#5145)slaren
2024-01-26ggml-alloc : add 10% margin to the buffer sizes (#5149)slaren
2024-01-26ggml : update softmax n_task calculation (#5126)snadampal
2024-01-26scripts : move run-with-preset.py from root to scripts folderGeorgi Gerganov
2024-01-26tests : gitignore test-c.oGeorgi Gerganov
2024-01-26server : refactored the task processing logic (#5065)Xuan Son Nguyen
2024-01-26ci : add model tests + script wrapper (#4586)crasm
2024-01-26metal : remove unused `n_buffers` and `buffers` (#5129)Paul Tsochantaris
2024-01-26gguf : fix "general.alignment" type in gguf_reader.py (#5136)Riceball LEE
2024-01-26readme : update hot topicsGeorgi Gerganov
2024-01-26Another bucket sort (#5109)Kawrakow
2024-01-25readme : add MobileVLM 1.7B/3B to the supported models list (#5107)XiaotaoChen
2024-01-25llama : dynamic temperature sampling (#4972)l3utterfly
2024-01-25examples : make pydantic scripts pass mypy and support py3.8 (#5099)Jared Van Bortel
2024-01-25android : use release cmake build type by default (#5123)Valentin Konovalov
2024-01-25Fix Q3_K_XS for MoE models (#5113)Kawrakow
2024-01-25metal : show compile log messagesGeorgi Gerganov
2024-01-24cuda : fix 2-bit quants on amd hip (#5105)Engininja2
2024-01-24nix-shell: use addToSearchPathMichael Hueschen
2024-01-24nix: add cc to devShell LD_LIBRARY_PATHMichael Hueschen
2024-01-24llama : pre-allocate input tensors in a separate buffer (#5100)slaren
2024-01-23metal : disable support for MUL_MAT F32 x F16Georgi Gerganov
2024-01-23Additional KL-divergence statistics (#5081)Kawrakow
2024-01-23CUDA: more info when no device code (#5088)Johannes Gäßler
2024-01-23minor : clean-up some warnings and style (#5094)Georgi Gerganov
2024-01-23devops : add intel oneapi dockerfile (#5068)Xuan Son Nguyen
2024-01-23llama.vim : added api key support (#5090)Michael Coppola
2024-01-22llama : fix not enough space in buffer with Qwen (#5086)slaren
2024-01-22KL-divergence (#5076)Kawrakow
2024-01-22ggml : parallelize FP32 conversion when using BLAS (#5045)Reinforce-II
2024-01-22llava : MobileVLM support (#4954)XiaotaoChen
2024-01-22flake.nix: add a comment about flakes vs nixSomeone Serge
2024-01-22nix: add a comment on the many nixpkgs-with-cuda instancesSomeone Serge
2024-01-22nix: add a comment about makeScopeSomeone Serge
2024-01-22nix: refactor the cleanSource rulesSomeone Serge
2024-01-22workflows: nix-ci: drop the redundant "paths" filterSomeone Serge
2024-01-22workflows: nix-build-aarch64: rate limitSomeone Serge
2024-01-22workflows: nix-ci: rebuild on flake.lock updatesSomeone Serge
2024-01-22imatrix : keep intermediate imatrix results (#5077)Kawrakow
2024-01-22llama : support StableLM 2 1.6B (#5052)compilade
2024-01-22finetune : print sample-start/include-sample-start (#5072)Daniel Bevenius
2024-01-22llama : add Q3_K_XS (#5060)Kawrakow
2024-01-22ci : fix Windows CI by updating Intel SDE version (#5053)bobqianic
2024-01-22llama : add more qwen2 models (#5071)Shijie
2024-01-21Revert LLAMA_NATIVE to OFF in flake.nix (#5066)iSma
2024-01-21add safetensors support to convert-lora-to-ggml.py (#5062)kuronekosaiko
2024-01-21add `#include <string>` to unicode.h (#5051)bobqianic
2024-01-21Add ability to evauate multiple choice tasks (#5047)Kawrakow