summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-08ci : enable on obj-c changes + fix metal build (#3540)Georgi Gerganov
2023-10-08zig : fix build by introducing train.cpp (#3539)Luo Tian
2023-10-08metal : support MTLGPUFamily < Apple7, formatting, style (#3524)Georgi Gerganov
* metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
2023-10-08llama : fix missing break in Persimmon arch case statements (#3535)Kerfuffle
2023-10-07Fix trying to strip newline from empty prompt and cfg prompt file content ↵Kerfuffle
(#3534)
2023-10-07gguf.py : fix CI for publishing GGUF package (#3532)M. Yusuf Sarıgöz
* Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version
2023-10-07py : change version of numpy requirement to 1.24.4 (#3515)Tom C
Co-authored-by: Lyjia <me@lyjia.us>
2023-10-07quantize : fail fast on write errors (#3521)cebtenzzre
2023-10-07metal : support default.metallib load & reuse code for swift package (#3522)Jhen-Jie Hong
* metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
2023-10-07llm : support Adept Persimmon 8B (#3410)Phillip Kravtsov
* Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci
2023-10-07Fix for #3454 (#3455)goerch
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
2023-10-06readme : update models, cuda + ppl instructions (#3510)BarfingLemurs
2023-10-06server : docs fix default values and add n_probs (#3506)Mihai
2023-10-06kv cache slot search improvements (#3493)Kerfuffle
* kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06prompts : fix editorconfig checks after #3416Georgi Gerganov
2023-10-06parallel : add option to load external prompt file (#3416)pudepiedj
* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06server : reuse llama_sample_token common util (#3494)Jhen-Jie Hong
* server : reuse llama_sample_token common function * common : use n_probs for temperature sampling
2023-10-06llama : correct hparams comparison (#3446)l3utterfly
* fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06ci : fix xcodebuild destinations (#3491)Jhen-Jie Hong
* ci : fix xcodebuild destinations * ci : add .swift to paths
2023-10-05convert : update Falcon script for new HF config (#3448)cebtenzzre
Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05build : use std::make_tuple() for compatibility with older GCC versions (#3488)Kenvix ⭐
2023-10-05common : process escape sequences in reverse prompts (#3461)staviq
2023-10-05CLBlast: Fix handling of on-device tensor datashibe2
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.
2023-10-05server : fix incorrect num_tokens_predicted (#3480)Jhen-Jie Hong
2023-10-05swift : disable ACCELERATE_NEW_LAPACK (#3481)Jhen-Jie Hong
2023-10-05ci : add swift build via xcodebuild (#3482)Jhen-Jie Hong
2023-10-04convert : fix Baichuan2 models by using vocab size in config.json (#3299)Kerfuffle
Use local GGUF package when possible in Baichuan converter
2023-10-04readme : add project status linkGeorgi Gerganov
2023-10-04ggml : fix build after #3329Georgi Gerganov
2023-10-04llm : add Refact model (#3329)ds5t5
* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)Georgi Gerganov
* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf
2023-10-04finetune : readme fix typo (#3465)Merrick Christensen
Fix small typo
2023-10-03ggml : add RISC-V Vector Support for K-Quants and improved the existing ↵Tameem
intrinsics (#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03main : consistent prefix/suffix coloring (#3425)h-h-h-h
* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03llama : fix session saving/loading (#3400)Georgi Gerganov
* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API
2023-10-03llama : expose model's rope_freq_scale in the API (#3418)Alex Klinkhamer
so it can be scaled further before creating a context.
2023-10-03metal : alibi for arbitrary number of heads (#3426)Jiahao Li
2023-10-03cmake : make LLAMA_NATIVE flag actually use the instructions supported by ↵Eve
the processor (#3273) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03Work on the BPE tokenizer (#3252)goerch
* Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-02convert : fix vocab size when not defined in hparams (#3421)cebtenzzre
2023-10-02cmake : increase minimum version for add_link_options (#3444)cebtenzzre
2023-10-02CLBlast: Add broadcast support for matrix multiplication (#3402)shibe2
Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.
2023-10-02gguf : add BERT, MPT, and GPT-J arch info (#3408)cebtenzzre
2023-10-02gguf : general usability improvements (#3409)cebtenzzre
2023-10-02cmake : make CUDA flags more similar to the Makefile (#3420)cebtenzzre
* cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build
2023-10-02finetune : fix #3404 (#3437)xaedes
the shapes for init model of gqa models was wrong
2023-10-02metal : set log callback before initializing (#3427)Adrian
2023-10-02cmake : fix transient definitions in find pkg (#3411)bandoti
2023-10-02docker : ignore Git files (#3314)Kevin Ji
2023-10-02infill : add new example + extend server API (#3296)vvhg1
* vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>