summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-11-13sync : ggml (backend v2) (#3912)Georgi Gerganov
* sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp
2023-11-13Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)Kerfuffle
* Add ReLU and SQR CUDA ops to fix Persimmon offloading * Persimmon loader: More helpful error on CUDA/ROCM when offloading too many layers
2023-11-12gguf-py: gguf_writer: Use bytearray to build metadata (#4051)Kerfuffle
* gguf-py: gguf_writer: Use BytesIO to build metadata * Use bytearray instead Bump gguf-py package version
2023-11-11Fix some documentation typos/grammar mistakes (#4032)Richard Kiss
* typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-11-11Fix gguf-convert-endian script (#4037)M. Yusuf Sarıgöz
* Fix gguf-convert-endian script * Bump version and update description
2023-11-10server : fix crash when prompt exceeds context size (#3996)Alexey Parfenov
2023-11-11gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981)Kerfuffle
* gguf-py: Refactor and add file reading support * Replay changes from #3871 Credit to @cebtenzzre for that pull * Various type annotation fixes. * sort imports with isort (again) * Fix missing return statement in add_tensor * style cleanup with flake8 * fix NamedTuple and Enum usage * Fix an issue with state init in GGUFReader Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly * Damagage is not a word. * Clean up gguf-py/examples/modify_gguf.py whitespace Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/examples/modify_gguf.py formatting Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/gguf/gguf_reader.py type hint Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Make examples executable, formatting changes * Add more information to GGUFReader and examples comments * Include a gguf Python package version bump * Add convert-gguf-endian.py script * cleanup * gguf-py : bump minor version * Reorganize scripts * Make GGUFReader endian detection less arbitrary * Add JSON dumping support to gguf-dump.py Which I kind of regret now * A few for gguf-dump.py cleanups * Murder accidental tuple in gguf-py/scripts/gguf-dump.py Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * cleanup * constants : remove unneeded type annotations * fix python 3.8 compat * Set up gguf- scripts in pyproject.toml * And include scripts/__init__.py, derp * convert.py: We can't currently support Q8_0 on big endian. * gguf-py: SpecialVocab: Always try available sources for special token ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u * cleanup * Promote add_X_token to GGUF metadata for BOS and EOS --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-10server : allow continue edit on completion mode (#3950)Jhen-Jie Hong
* server : allow continue edit on completion mode * server : handle abort case in runCompletion * server : style improvement
2023-11-10Unbreak persimmon after #3837 (#4010)Galunid
2023-11-09scripts: Generalize convert scripts (#3838)Galunid
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
2023-11-08server : add min_p param (#3877)Mihai
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending
2023-11-08ggml-alloc : fix backend assignments of views (#3982)slaren
2023-11-07gguf : track writer state, free unneeded tensors, cleanup (#3871)Jared Van Bortel
2023-11-07make : do not add linker flags when compiling static llava lib (#3977)Georgi Gerganov
2023-11-07ggml : fix backward rope after YaRN (#3974)xaedes
* fix backward process of rope rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration
2023-11-07Use params when loading models in llava-cli (#3976)Matthew Tejo
llava-cli was loading models with default params and ignoring settings from the cli. This switches to a generic function to load the params from the cli options.
2023-11-07cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)Meng Zhang
* protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build * doc: add comments to ggml_cublas_loaded() * fix defined(...)
2023-11-07llava : expose as a shared library for downstream projects (#3613)Damian Stewart
* wip llava python bindings compatibility * add external llava API * add base64 in-prompt image support * wip refactor image loading * refactor image load out of llava init * cleanup * further cleanup; move llava-cli into its own file and rename * move base64.hpp into common/ * collapse clip and llava libraries * move llava into its own subdir * wip * fix bug where base64 string was not removed from the prompt * get libllava to output in the right place * expose llava methods in libllama.dylib * cleanup memory usage around clip_image_* * cleanup and refactor *again* * update headerdoc * build with cmake, not tested (WIP) * Editorconfig * Editorconfig * Build with make * Build with make * Fix cyclical depts on Windows * attempt to fix build on Windows * attempt to fix build on Windows * Upd TODOs * attempt to fix build on Windows+CUDA * Revert changes in cmake * Fix according to review comments * Support building as a shared library * address review comments --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-11-05ggml-cuda : fix f16 mul mat (#3961)slaren
* ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus)
2023-11-05Allow common process_escapes to handle \x sequences (#3928)Kerfuffle
* Allow common process_escapes to handle \x sequences * Fix edge case when second hex digit is NUL
2023-11-05server : fix typo for --alias shortcut from -m to -a (#3958)Thái Hoàng Tâm
2023-11-05cuda : fix disabling device with --tensor-split 1,0 (#3951)Jared Van Bortel
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-05llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)Meng Zhang
as done in https://github.com/ggerganov/llama.cpp/pull/3827
2023-11-05cmake : MSVC instruction detection (fixed up #809) (#3923)Eve
* Add detection code for avx * Only check hardware when option is ON * Modify per code review sugguestions * Build locally will detect CPU * Fixes CMake style to use lowercase like everywhere else * cleanup * fix merge * linux/gcc version for testing * msvc combines avx2 and fma into /arch:AVX2 so check for both * cleanup * msvc only version * style * Update FindSIMD.cmake --------- Co-authored-by: Howard Su <howard0su@gmail.com> Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05ci : use intel sde when ci cpu doesn't support avx512 (#3949)Eve
2023-11-05cuda : revert CUDA pool stuff (#3944)slaren
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d. * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" This reverts commit d6069051de7165a4e06662c89257f5d2905bb156. ggml-ci
2023-11-04gguf-py: Support 01.AI Yi models (#3943)Kerfuffle
2023-11-03metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938)Peter Sugihara
2023-11-03ggml-metal: fix yarn rope (#3937)Xiao-Yong Jin
2023-11-03ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)slaren
2023-11-03speculative : change default p_accept to 0.5 + CLI args (#3919)Georgi Gerganov
ggml-ci
2023-11-03common : YAYF (yet another YARN fix) (#3925)Georgi Gerganov
ggml-ci
2023-11-03llama : change yarn_ext_factor placeholder to -1 (#3922)cebtenzzre
2023-11-02cuda : add ROCM aliases for CUDA pool stuff (#3918)Kerfuffle
2023-11-02cmake : fix relative path to git submodule index (#3915)Andrei
2023-11-02readme : add notice about #3912Georgi Gerganov
2023-11-02cuda : fix const ptrs warning causing ROCm build issues (#3913)Georgi Gerganov
2023-11-02cuda : use CUDA memory pool with async memory allocation/deallocation when ↵Oleksii Maryshchenko
available (#3903) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02gguf : print error for GGUFv1 files (#3908)Georgi Gerganov
2023-11-02cmake : disable LLAMA_NATIVE by default (#3906)slaren
2023-11-02gguf : remove special-case code for GGUFv1 (#3901)Georgi Gerganov
ggml-ci
2023-11-02llm : prevent from 1-D tensors being GPU split (#3697)Georgi Gerganov
2023-11-02build : link against build info instead of compiling against it (#3879)cebtenzzre
* cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02cuda : check if this fixes Pascal card regression (#3882)Georgi Gerganov
2023-11-02metal : fix build errors and kernel sig after #2268 (#3898)Georgi Gerganov
2023-11-02cuda : fix RoPE after #2268 (#3897)cebtenzzre
2023-11-01llama : fix llama_context_default_params after #2268 (#3893)cebtenzzre
2023-11-01ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)slaren
* ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel * fix warnings
2023-11-01llama : implement YaRN RoPE scaling (#2268)cebtenzzre
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>
2023-11-01llm : fix llm_build_kqv taking unused tensor (benign, #3837)Georgi Gerganov