summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-11common : fix mirostat state when using multiple sequences (#3543)Kerfuffle
* Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts
2023-10-11batched : add bench tool (#3545)Georgi Gerganov
* batched : add bench tool * batched : minor fix table * batched-bench : add readme + n_kv_max is now configurable * batched-bench : init warm-up batch * batched-bench : pass custom set of PP, TG and PL * batched-bench : add mmq CLI arg
2023-10-11examples : add batched.swift + improve CI for swift (#3562)Zane Shannon
2023-10-10Add MPT model to supported models in README.md (#3574)Galunid
2023-10-10Minor improvements in GPT2 tokenizer (#3567)goerch
* Fixing minor bugs in bpe_gpt2_preprocess * Don't add bos token in test
2023-10-10readme : add bloom (#3570)Xingchen Song(宋星辰)
2023-10-10llm : add bloom models (#3553)Xingchen Song(宋星辰)
* feat: Support bloom models * fix(bloom): fix model size --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10swift : improvements and fixes (#3564)Jhen-Jie Hong
* swift : use macOS 12 as minimum requirement * swift : add missing ggml-backend.c source * swift : add -O3 -DNDEBUG unsafe flags
2023-10-10llm : add MPT support (#3417)Jan Ploski
* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545) * mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt * mpt : protect against "clip_qkv": null in mpt-7b * mpt : quick fix to avoid "Strange model" warning when quantizing MPT models * mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?) * mpt : standardized all tensor names to follow GGUF spec * mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code * mpt : fixed comment s/gptneox/mpt/ * mpt : remove tabs, trailing whitespace * mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt * mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252 * comment out n_past instead of marking it unused * mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"] * mpt : remove unused tokenizer_json in convert script * ggml : remove obsolete n_past assert in ggml_alibi * llama : print clam_kqv and max_alibi_bias hparams --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10infill. : fix tokenization (#3508)vvhg1
* infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check
2023-10-09ggml-alloc : fix assert in debug builds (#3555)slaren
2023-10-09refact : fix convert script + zero out KV cache to avoid nans (#3523)Georgi Gerganov
* refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements
2023-10-09metal : do not use mul_mm kernels when ne00 < 64 (#3542)Georgi Gerganov
2023-10-08sync : ggml (ggml-backend) (#3548)Georgi Gerganov
* sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build
2023-10-08ci : add Zig CI/CD and fix build (#2996)Matheus C. França
* zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08api_like_OAI.py : compat with Microsoft Guidance (#2746)Ryder Wishart
Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08api_like_OAI.py : simplify function (#2796)arcrank
Simplify function
2023-10-08k-quants : fix comments about block sizing (#3499)Johannes Rudolph
2023-10-08ci : enable on obj-c changes + fix metal build (#3540)Georgi Gerganov
2023-10-08zig : fix build by introducing train.cpp (#3539)Luo Tian
2023-10-08metal : support MTLGPUFamily < Apple7, formatting, style (#3524)Georgi Gerganov
* metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
2023-10-08llama : fix missing break in Persimmon arch case statements (#3535)Kerfuffle
2023-10-07Fix trying to strip newline from empty prompt and cfg prompt file content ↵Kerfuffle
(#3534)
2023-10-07gguf.py : fix CI for publishing GGUF package (#3532)M. Yusuf Sarıgöz
* Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version
2023-10-07py : change version of numpy requirement to 1.24.4 (#3515)Tom C
Co-authored-by: Lyjia <me@lyjia.us>
2023-10-07quantize : fail fast on write errors (#3521)cebtenzzre
2023-10-07metal : support default.metallib load & reuse code for swift package (#3522)Jhen-Jie Hong
* metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
2023-10-07llm : support Adept Persimmon 8B (#3410)Phillip Kravtsov
* Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci
2023-10-07Fix for #3454 (#3455)goerch
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
2023-10-06readme : update models, cuda + ppl instructions (#3510)BarfingLemurs
2023-10-06server : docs fix default values and add n_probs (#3506)Mihai
2023-10-06kv cache slot search improvements (#3493)Kerfuffle
* kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06prompts : fix editorconfig checks after #3416Georgi Gerganov
2023-10-06parallel : add option to load external prompt file (#3416)pudepiedj
* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06server : reuse llama_sample_token common util (#3494)Jhen-Jie Hong
* server : reuse llama_sample_token common function * common : use n_probs for temperature sampling
2023-10-06llama : correct hparams comparison (#3446)l3utterfly
* fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06ci : fix xcodebuild destinations (#3491)Jhen-Jie Hong
* ci : fix xcodebuild destinations * ci : add .swift to paths
2023-10-05convert : update Falcon script for new HF config (#3448)cebtenzzre
Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05build : use std::make_tuple() for compatibility with older GCC versions (#3488)Kenvix ⭐
2023-10-05common : process escape sequences in reverse prompts (#3461)staviq
2023-10-05CLBlast: Fix handling of on-device tensor datashibe2
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.
2023-10-05server : fix incorrect num_tokens_predicted (#3480)Jhen-Jie Hong
2023-10-05swift : disable ACCELERATE_NEW_LAPACK (#3481)Jhen-Jie Hong
2023-10-05ci : add swift build via xcodebuild (#3482)Jhen-Jie Hong
2023-10-04convert : fix Baichuan2 models by using vocab size in config.json (#3299)Kerfuffle
Use local GGUF package when possible in Baichuan converter
2023-10-04readme : add project status linkGeorgi Gerganov
2023-10-04ggml : fix build after #3329Georgi Gerganov
2023-10-04llm : add Refact model (#3329)ds5t5
* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)Georgi Gerganov
* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf
2023-10-04finetune : readme fix typo (#3465)Merrick Christensen
Fix small typo