summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-06server : docs fix default values and add n_probs (#3506)Mihai
2023-10-06kv cache slot search improvements (#3493)Kerfuffle
* kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06prompts : fix editorconfig checks after #3416Georgi Gerganov
2023-10-06parallel : add option to load external prompt file (#3416)pudepiedj
* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06server : reuse llama_sample_token common util (#3494)Jhen-Jie Hong
* server : reuse llama_sample_token common function * common : use n_probs for temperature sampling
2023-10-06llama : correct hparams comparison (#3446)l3utterfly
* fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06ci : fix xcodebuild destinations (#3491)Jhen-Jie Hong
* ci : fix xcodebuild destinations * ci : add .swift to paths
2023-10-05convert : update Falcon script for new HF config (#3448)cebtenzzre
Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05build : use std::make_tuple() for compatibility with older GCC versions (#3488)Kenvix ⭐
2023-10-05common : process escape sequences in reverse prompts (#3461)staviq
2023-10-05CLBlast: Fix handling of on-device tensor datashibe2
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.
2023-10-05server : fix incorrect num_tokens_predicted (#3480)Jhen-Jie Hong
2023-10-05swift : disable ACCELERATE_NEW_LAPACK (#3481)Jhen-Jie Hong
2023-10-05ci : add swift build via xcodebuild (#3482)Jhen-Jie Hong
2023-10-04convert : fix Baichuan2 models by using vocab size in config.json (#3299)Kerfuffle
Use local GGUF package when possible in Baichuan converter
2023-10-04readme : add project status linkGeorgi Gerganov
2023-10-04ggml : fix build after #3329Georgi Gerganov
2023-10-04llm : add Refact model (#3329)ds5t5
* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)Georgi Gerganov
* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf
2023-10-04finetune : readme fix typo (#3465)Merrick Christensen
Fix small typo
2023-10-03ggml : add RISC-V Vector Support for K-Quants and improved the existing ↵Tameem
intrinsics (#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03main : consistent prefix/suffix coloring (#3425)h-h-h-h
* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03llama : fix session saving/loading (#3400)Georgi Gerganov
* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API
2023-10-03llama : expose model's rope_freq_scale in the API (#3418)Alex Klinkhamer
so it can be scaled further before creating a context.
2023-10-03metal : alibi for arbitrary number of heads (#3426)Jiahao Li
2023-10-03cmake : make LLAMA_NATIVE flag actually use the instructions supported by ↵Eve
the processor (#3273) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03Work on the BPE tokenizer (#3252)goerch
* Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-02convert : fix vocab size when not defined in hparams (#3421)cebtenzzre
2023-10-02cmake : increase minimum version for add_link_options (#3444)cebtenzzre
2023-10-02CLBlast: Add broadcast support for matrix multiplication (#3402)shibe2
Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.
2023-10-02gguf : add BERT, MPT, and GPT-J arch info (#3408)cebtenzzre
2023-10-02gguf : general usability improvements (#3409)cebtenzzre
2023-10-02cmake : make CUDA flags more similar to the Makefile (#3420)cebtenzzre
* cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build
2023-10-02finetune : fix #3404 (#3437)xaedes
the shapes for init model of gqa models was wrong
2023-10-02metal : set log callback before initializing (#3427)Adrian
2023-10-02cmake : fix transient definitions in find pkg (#3411)bandoti
2023-10-02docker : ignore Git files (#3314)Kevin Ji
2023-10-02infill : add new example + extend server API (#3296)vvhg1
* vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-30ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)slaren
* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU
2023-09-29llama.cpp : add documentation about rope_freq_base and scale values (#3401)slaren
* llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics
2023-09-29train : fix KQ_pos allocation (#3392)Georgi Gerganov
* train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-29llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)Cebtenzzre
* llama : enable mmap in quantize on Linux -> 31% faster * also enable mmap on Windows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-29readme : update hot topics + model links (#3399)BarfingLemurs
2023-09-29readme : add link to grammars app (#3388)Andrew Duffy
* Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211 * Update README.md
2023-09-29swift : fix build on xcode 15 (#3387)Jhen-Jie Hong
2023-09-28build : enable more non-default compiler warnings (#3200)Cebtenzzre
2023-09-28ggml_tensor: update the structure comments. (#3283)Hua Jiang
* ggml_tensor: update the structure comments. * remove semicolon Co-authored-by: slaren <slarengh@gmail.com> * Update ggml.h --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
2023-09-28ggml : release the requested thread pool resource (#3292)Qu Zongfu
* Release the requested thread pool resource * Release the requested thread pool resource 2 --------- Co-authored-by: Zongfu ZF3 Qu <quzf3@Lenovo.com>
2023-09-28llama.cpp : split llama_context_params into model and context params (#3301)slaren
* llama.cpp : split llama_context_params into model and context params ggml-ci * fix metal build * fix freq_base/scale default to model value * llama-bench : keep the same model between tests when possible * move n_threads to llama_context_params, add n_threads_batch * fix mpi build * remove kv_size(), cuda scratch fixes * remove low-vram option * add n_threads_batch to system info, refactor to get_system_info() * add documentation about --threads-batch to the READMEs * llama-bench fix * main : fix rope freq/scale warning * llama.cpp : add llama_get_model common : add llama_tokenize from model * remove duplicated ctx/model functions ggml-ci * cuda : print total VRAM used
2023-09-28ci : multithreaded builds (#3311)Eve
* mac and linux threads * windows * Update build.yml * Update build.yml * Update build.yml * automatically get thread count * windows syntax * try to fix freebsd * Update build.yml * Update build.yml * Update build.yml