summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-03llama : fix bpe tokenize from byte (#2889)opparco
2023-09-03metal : revert 6af0bab until we fix itGeorgi Gerganov
This restores the generated text to be the same as before #2959
2023-09-03cov : add Code Coverage and codecov.io integration (#2928)Alon
* update .gitignore * makefile: add coverage support (lcov, gcovr) * add code-coverage workflow * update code coverage workflow * wun on ubuntu 20.04 * use gcc-8 * check why the job hang * add env vars * add LLAMA_CODE_COVERAGE=1 again * - add CODECOV_TOKEN - add missing make lcov-report * install lcov * update make file -pb flag * remove unused GGML_NITER from workflows * wrap coverage output files in COV_TARGETS
2023-09-03opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955)Wentai Zhang
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03metal : more optimizations (#2959)Kawrakow
* Very minor speedup via simd-group synchronization in f16 x f32 * Another very minor speedup on metal * Quite significant PP speedup on metal * Another attempt * Minor * Massive improvement for TG for fp16 * ~4-5% improvement for Q8_0 TG on metal --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-03swift : add support for k-quants (#2983)kchro3
2023-09-03convert.py : BPE fixes (#2938)Kerfuffle
* convert.py: BPE fixes? * Remove unnecessary conditional in addl token error handling
2023-09-03docs : add `catai` to `README.md` (#2967)Ido S
2023-09-03examples : fix gpt-neox (#2943)momonga
Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
2023-09-03swift : add missing c file to Package.swift (#2978)kchro3
2023-09-03make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886)Cebtenzzre
* make : remove unused -DGGML_BIG_ENDIAN * make : put preprocessor stuff in CPPFLAGS * make : pass Raspberry Pi arch flags to g++ as well * make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS * make : fix inverted conditional
2023-09-02logging: Fix creating empty file even when disabled (#2966)Kerfuffle
* logging: Fix creating empty file even when disabled * Minor formatting fix Co-authored-by: staviq <staviq@gmail.com> --------- Co-authored-by: staviq <staviq@gmail.com>
2023-09-02readme : update clblast instructions (#2903)bandoti
* Update Windows CLBlast instructions * Update Windows CLBlast instructions * Remove trailing whitespace
2023-09-02metal : show all Metal device instances in the system (#2952)Karsten Weiss
* ggml_metal_init: Show all Metal device instances in the system Also show the default Metal device that was picked. * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-02k-quants : fix build on armv7 (android only) (#2920)Jhen-Jie Hong
* k-quants : fix build on armv7 * ggml : cleanup unused arm32 specific impl * k-quants : avoid some unused vzero / mzero define * ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm
2023-09-02server : avoid aniprompt in probabilities of final response (#2849)Jhen-Jie Hong
2023-09-01cuda : vsubss4 for older versions of ROCm/clang (#2942)Engininja2
2023-09-01readme : quick start command fix (#2908)ZHAOKAI WANG
* quick start command fix * quick start win command fix
2023-09-01Allow quantize to only copy tensors, some other improvements (#2931)Kerfuffle
* Allow quantize tool to only copy tensors to allow repackaging models. * Slightly better logic when requantizing. * Change help message to go to `stdout`.
2023-09-01llama2c : rename functionGeorgi Gerganov
2023-09-01make : use unaligned vector moves on MinGW (#2945)Cebtenzzre
Fixes #2922
2023-09-01minor : add const qualifiers (#2853)m3ndax
* made the methods const # Conflicts: # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp * made method const * Update convert-llama2c-to-ggml.cpp removed write_raw and write_u32 * llama2c : remove misleading const --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-01docs : add java-llama.cpp to README.md (#2935)Konstantin Herud
2023-09-01build : fix most gcc and clang warnings (#2861)Cebtenzzre
* fix most gcc and clang warnings * baby-llama : remove commented opt_params_adam * fix some MinGW warnings * fix more MinGW warnings
2023-09-01examples : add C grammar (#2357)Ben Siraphob
2023-09-01ggml : add RISC-V vector intrinsics support (#2929)Tameem
* added support for RISCV CFLAGS & native compile + cross compile options * Add RISC-V Vector Intrinsics Support Added RVV intrinsics for following ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 ggml_vec_dot_q8_0_q8_0 Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai> Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai> Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
2023-09-01metal : slight speed-up for add and mul kernels (#2917)Georgi Gerganov
2023-09-01logs : fix mingw-like builds (fixes #2898) (#2911)staviq
* fix mingw-like builds * formatting * make LOG_COMPAT easier to override and extend * simplify win detection * fix for #2940
2023-09-01llama2c : fix segfault and alloc-dealloc-mismatch (#2913)Cebtenzzre
* llama2c : fix segfault if vocab is not found * llama2c : fix mismatch between new[] and delete * llama2c : fix basename on Windows * llama2c : use a destructor to prevent memory leaks
2023-09-01metal: somewhat faster f16 x f32 matrix multiply kernel (#2951)Kawrakow
* Somewhat faster f16 x f32 matrix multiply kernel * Better use 32 thread groups for f16 x f32 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-31convert : fix another python 3.8 issue (#2949)Cebtenzzre
2023-09-01remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py (#2906)slaren
2023-08-31scripts: Use local gguf package when running from repo (#2927)Kerfuffle
* scripts: Use local gguf when running from repo
2023-08-31@vxiiduu's fix for PrefetchVirtualMemory (#2930)DannyDaemonic
Reimplement fix for `PrefetchVirtualMemory`. Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
2023-08-31convert : fix python 3.8 support, modernize type annotations (#2916)Cebtenzzre
* convert : fix python 3.8 support * convert : sort imports * convert : fix required parameters in convert-llama-ggmlv3-to-gguf * convert : fix mypy errors in convert-llama-ggmlv3-to-gguf * convert : use PEP 585 generics and PEP 604 unions Now that we have `from __future__ import annotations`, we can use this modern syntax in Python 3.7 instead of restricting support to Python 3.9 or 3.10 respectively. * gguf.py : a tuple is already a tuple * add mypy.ini * convert : add necessary `type: ignore` comments * gguf-py: bump version
2023-08-30CUDA: mul_mat_q=true llama_context_params default (#2912)Johannes Gäßler
2023-08-30[Docker] fix tools.sh argument passing. (#2884)Henri Vasserman
* [Docker] fix tools.sh argument passing. This should allow passing multiple arguments to containers with the full image that are using the tools.sh frontend. Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734
2023-08-30convert.py : use dir name to name the llamaGeorgi Gerganov
2023-08-30examples : fix underscore in beam-search + .gitignore (close #2900)Georgi Gerganov
2023-08-30gguf : add workflow for Pypi publishing (#2896)M. Yusuf Sarıgöz
* gguf : add workflow for Pypi publishing * gguf : add workflow for Pypi publishing * fix trailing whitespace
2023-08-30make : add test and update CI (#2897)alonfaraj
* build ci: run make test * makefile: - add all - add test * enable tests/test-tokenizer-0-llama * fix path to model * remove gcc-8 from macos build test * Update Makefile * Update Makefile
2023-08-30docs : add `node-llama-cpp` to `README.md` (#2885)Gilad S
2023-08-30convert : various script cleanups/fixes + merges and special token handling ↵Kerfuffle
(#2842) * convert: Fix permute calls and method/func definitions * Cleanups for gguf-py * Minor types cleanups. * Initial implementation of handling merges and special tokens * convert: Handle special tokens and merges in vocab only mode convert: Vocab only mode no longer requires loading model tensors * gguf: Refactor tensor name mapping * convert: Fix type hint for special_token_types in SpecialVocab * Use common special vocab handling in various conversion scripts * First pass at implementing suggested changes * Second pass * gguf: SpecialVocab: Fix issue with special token content not in a dict gguf: SpecialVocab: Allow skipping handling of merges * convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json * convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer * gguf: SpecialVocab: Actually set load_merges in object * Uniform args parsing and vocab only mode for convert examples * convert.py: Set gpt2 as tokenizer model when using BPE * Squish last type warning in gguf.py - yay!
2023-08-30llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879)chaihahaha
2023-08-30main : log file (#2748)staviq
* initial, base LOG macro * add *.log to .gitignore * added basic log file handler * reverted log auto endline to better mimic printf * remove atomics and add dynamic log target * log_enable/disable, LOG_TEE, basic usage doc * update .gitignore * mv include to common, params, help msg * log tostring helpers, token vectors pretty prints * main: replaced fprintf/LOG_TEE, some trace logging * LOG_DISABLE_LOGS compile flag, wrapped f in macros * fix LOG_TEELN and configchecker * stub LOG_DUMP_CMDLINE for WIN32 for now * fix msvc * cleanup main.cpp:273 * fix stray whitespace after master sync * log : fix compile warnings - do not use C++20 stuff - use PRIu64 to print uint64_t - avoid string copies by using const ref - fix ", ##__VA_ARGS__" warnings - compare strings with == and != * log : do not append to existing log + disable file line func by default * log : try to fix Windows build * main : wip logs * main : add trace log * review: macro f lowercase, str append to sstream * review: simplify ifs and str comparisons * fix MSVC, formatting, FMT/VAL placeholders * review: if/else cleanup * review: if/else cleanup (2) * replace _ prefix with _impl suffix --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-30tests : add a C compliance test (#2848)Cebtenzzre
* tests : add a C compliance test * make : build C compliance test by default * make : fix clean and make sure C test fails on clang * make : move -Werror=implicit-int to CFLAGS
2023-08-29ggml : add view_src and view_offs to ggml_tensor for views (#2874)slaren
* ggml : add view_src and view_offs * update ggml-alloc to use view_src * update ggml_diag_mask to work correctly with automatic inplace * exclude other ops that set an inplace flag from automatic inplace
2023-08-29remove outdated references to -eps and -gqa from README (#2881)slaren
2023-08-29Tell users attmepting to run perplexity with too few tokens to use more (#2882)Kawrakow
Closes #2858 Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-2910X faster BPE tokenizer (#2876)Kawrakow
* 10X faster BPE tokenizer * Remove comment that no longer applies --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>