summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-18metal : implement q5_0 and q5_1 kernels (#3648)Jhen-Jie Hong
* metal : implement dequantize_q5_0 * metal : block_q_n_dot_y for block_q5_0 (broken) * metal : revert unnecessary change * metal : implement dequantize_q5_1 * metal : block_q_n_dot_y for q5_1 (broken) * metal : fix block_q_n_dot_y * minor : spaces / formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-18opencl : fix element-wise multiplication (#3656)shibe2
2023-10-17fix embeddings when using CUDA (#3657)slaren
2023-10-17llama : avoid fprintf in favor of LLAMA_LOG (#3538)Georgi Gerganov
2023-10-17readme : update hot-topics & models, detail windows release in usage (#3615)BarfingLemurs
* Update README.md * Update README.md * Update README.md * move "Running on Windows" section below "Prepare data and run" --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17CLBlast: Fix temporary buffer size for f16 conversion (wsize)shibe2
Fix buffer overflow. Reduce the size to fit just one 2D slice. Assert sufficient size.
2023-10-17train-text-from-scratch : fix assert failure in ggml-alloc (#3618)slaren
2023-10-17editorconfig : remove trailing spacesGeorgi Gerganov
2023-10-17server : documentation of JSON return value of /completion endpoint (#3632)coezbek
* Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17save-load-state : fix example + add ci test (#3655)Georgi Gerganov
* save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci
2023-10-17readme : add Aquila2 links (#3610)ldwang
Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>
2023-10-17tokenizer : special token handling (#3538)staviq
* Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17k-quants : fix quantization ranges (#3646)Georgi Gerganov
2023-10-16llava : fix tokenization to not add bos between image embeddings and user ↵Georgi Gerganov
prompt (#3645) * llava : fix tokenization to not add bos after system prompt * set seed --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-15MPT : support GQA for replit-code-v1.5 (#3627)cebtenzzre
2023-10-14Honor -ngl option for Cuda offloading in llava (#3621)M. Yusuf Sarıgöz
2023-10-13llama : remove n_threads from llama_decode_internal (#3614)Daniel Bevenius
This commit removes `n_threads` from the `llama_decode_internal` functions doc comment as it does not exist anymore. It looks like this parameter was removed in Commit 16bc66d9479edd5ee12ec734973554d4493c5dfa ("llama.cpp : split llama_context_params into model and context params"). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-10-13ggml : add context enumeration functions (#3605)slaren
finetune : fix assert failure in ggml-alloc
2023-10-12CLBlast: Fix matrix-vector multiplication (#3544)shibe2
2023-10-12examples: support LLaVA v1.5 (multimodal model) (#3436)M. Yusuf Sarıgöz
* WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-12docs : fix typo GOMP_CPU_AFFINITY (#3597)uint256_t
2023-10-12cmake : fix add_compile_options on macOSGeorgi Gerganov
2023-10-12typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592)Ian Scrivener
fixed a typo in the MacOS Metal run doco
2023-10-12ci : check if there is enough VRAM (#3596)Georgi Gerganov
ggml-ci
2023-10-12server : add completion mode (no chat) (#3582)Aarni Koskela
2023-10-12prompts : add mnemonics.txtGeorgi Gerganov
2023-10-12server : fix kv cache management (#3588)Georgi Gerganov
2023-10-11main : fix session loading bug (#3400)Georgi Gerganov
2023-10-11server : add parameter -tb N, --threads-batch N (#3584)Michael Coppola
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-11common : fix mirostat state when using multiple sequences (#3543)Kerfuffle
* Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts
2023-10-11batched : add bench tool (#3545)Georgi Gerganov
* batched : add bench tool * batched : minor fix table * batched-bench : add readme + n_kv_max is now configurable * batched-bench : init warm-up batch * batched-bench : pass custom set of PP, TG and PL * batched-bench : add mmq CLI arg
2023-10-11examples : add batched.swift + improve CI for swift (#3562)Zane Shannon
2023-10-10Add MPT model to supported models in README.md (#3574)Galunid
2023-10-10Minor improvements in GPT2 tokenizer (#3567)goerch
* Fixing minor bugs in bpe_gpt2_preprocess * Don't add bos token in test
2023-10-10readme : add bloom (#3570)Xingchen Song(宋星辰)
2023-10-10llm : add bloom models (#3553)Xingchen Song(宋星辰)
* feat: Support bloom models * fix(bloom): fix model size --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10swift : improvements and fixes (#3564)Jhen-Jie Hong
* swift : use macOS 12 as minimum requirement * swift : add missing ggml-backend.c source * swift : add -O3 -DNDEBUG unsafe flags
2023-10-10llm : add MPT support (#3417)Jan Ploski
* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545) * mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt * mpt : protect against "clip_qkv": null in mpt-7b * mpt : quick fix to avoid "Strange model" warning when quantizing MPT models * mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?) * mpt : standardized all tensor names to follow GGUF spec * mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code * mpt : fixed comment s/gptneox/mpt/ * mpt : remove tabs, trailing whitespace * mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt * mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252 * comment out n_past instead of marking it unused * mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"] * mpt : remove unused tokenizer_json in convert script * ggml : remove obsolete n_past assert in ggml_alibi * llama : print clam_kqv and max_alibi_bias hparams --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10infill. : fix tokenization (#3508)vvhg1
* infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check
2023-10-09ggml-alloc : fix assert in debug builds (#3555)slaren
2023-10-09refact : fix convert script + zero out KV cache to avoid nans (#3523)Georgi Gerganov
* refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements
2023-10-09metal : do not use mul_mm kernels when ne00 < 64 (#3542)Georgi Gerganov
2023-10-08sync : ggml (ggml-backend) (#3548)Georgi Gerganov
* sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build
2023-10-08ci : add Zig CI/CD and fix build (#2996)Matheus C. França
* zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08api_like_OAI.py : compat with Microsoft Guidance (#2746)Ryder Wishart
Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08api_like_OAI.py : simplify function (#2796)arcrank
Simplify function
2023-10-08k-quants : fix comments about block sizing (#3499)Johannes Rudolph
2023-10-08ci : enable on obj-c changes + fix metal build (#3540)Georgi Gerganov
2023-10-08zig : fix build by introducing train.cpp (#3539)Luo Tian
2023-10-08metal : support MTLGPUFamily < Apple7, formatting, style (#3524)Georgi Gerganov
* metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7