Age | Commit message (Collapse) | Author |
|
* Fix mirostat state when using multiple sequences
* Fix mirostat by completely refactoring sampling!
* Try to fix zig build.
* Export function to fetch/create default sampler states
Code formatting cleanups and add some comments
Silence a warning about id not being used when logging is disabled
* Apply some renaming suggestions.
Fix comments that were out of sync with the pull.
* Use more consistant naming convention for sampling contexts
|
|
* batched : add bench tool
* batched : minor fix table
* batched-bench : add readme + n_kv_max is now configurable
* batched-bench : init warm-up batch
* batched-bench : pass custom set of PP, TG and PL
* batched-bench : add mmq CLI arg
|
|
|
|
|
|
* Fixing minor bugs in bpe_gpt2_preprocess
* Don't add bos token in test
|
|
|
|
* feat: Support bloom models
* fix(bloom): fix model size
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* swift : use macOS 12 as minimum requirement
* swift : add missing ggml-backend.c source
* swift : add -O3 -DNDEBUG unsafe flags
|
|
* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545)
* mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt
* mpt : protect against "clip_qkv": null in mpt-7b
* mpt : quick fix to avoid "Strange model" warning when quantizing MPT models
* mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?)
* mpt : standardized all tensor names to follow GGUF spec
* mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code
* mpt : fixed comment s/gptneox/mpt/
* mpt : remove tabs, trailing whitespace
* mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt
* mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252
* comment out n_past instead of marking it unused
* mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"]
* mpt : remove unused tokenizer_json in convert script
* ggml : remove obsolete n_past assert in ggml_alibi
* llama : print clam_kqv and max_alibi_bias hparams
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* infill tokens correction
* serverinfill tokens correction
* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape
* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape
* only rm when params.escape, rm space if possible which is added back or rm added space token
* only rm when params.escape, rm space if possible which is added back or rm added space token
* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"
This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738.
* fix interactive prompt escaping and fix server infill leading space handling
* rm unnecessary bool check
|
|
|
|
* refact : fix convert script + zero out KV cache to avoid nans
* ggml : silu(-inf) should never happen
* metal : assert various kernel requirements
|
|
|
|
* sync : ggml (ggml-backend)
ggml-ci
* zig : add ggml-backend to the build
|
|
* zig CI/CD and fix build
Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com>
* fix build_compiler
* ci : remove trailing whitespace
---------
Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
Check for None in addition to empty string check in all request params
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
Simplify function
|
|
|
|
|
|
|
|
* metal : improve decoding speed for batches of 2-16
* metal : rename kernels mul_mat_ to mul_mv_
* metal : indentations
* minor
* metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
|
|
|
|
(#3534)
|
|
* Fix CI for publishing GGUF package
* Bump version
* fix
* bump version
* bump version
* bump version
|
|
Co-authored-by: Lyjia <me@lyjia.us>
|
|
|
|
* metal : support load default.metallib & reuse code for swift package
* metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
|
|
* Produces garbage output
* wip: correct tensors up to RoPE
* correct tensors thru RoPE
* Correct outputs through masked & softmax'd KQ
* fp32 works
* Rename adept->persimmon
* Produces correct outputs
* clean up convert scripts
* remove printing logic from ggml.c
* remove prints from llama.cpp & fix merge
* trivial cleanups
* Add offload funcs
* update conversion script to directly take adept artifacts rather than .saftensors file
* Fix norm eps bug
* Support sqr and concat on metal, persimmon-8b-q4 runs correctly
* Small changes from review
* Formatting changes
* Minor changes to conversion script
* Remove old script
* Fix editorconfig formatting
* Fix build
* add overlooked offload code ggml-ci
|
|
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
|
|
|
|
|
|
* kv cache slot search improvements
* Use n_ctx in kv find slot for consistency
* Ensure kv cache head points to a valid slot in llama_decode internal
* Add some comments to prevent dumb people (like me) from getting confused.
|
|
|
|
* Enable external file and add datestamp
* Add name of external file at end
* Upload ToK2024
* Delete ToK2024.txt
* Experiments with jeopardy
* Move ParallelQuestions to /proimpts and rename
* Interim commit
* Interim commit
* Final revision
* Remove trailing whitespace
* remove cmake_all.sh
* Remove cmake_all.sh
* Changed .gitignore
* Improved reporting and new question files.
* Corrected typo
* More LLM questions
* Update LLM-questions.txt
* Yet more LLM-questions
* Remove jeopardy results file
* Reinstate original jeopardy.sh
* Update examples/parallel/parallel.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* server : reuse llama_sample_token common function
* common : use n_probs for temperature sampling
|
|
* fixed floating point comparison issues
* updated implementation for hparam comparison to handle inf and NaN
* fixed code review comments
* minor simplification
* rename is_float_eq -> is_float_close
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
|
|
* ci : fix xcodebuild destinations
* ci : add .swift to paths
|
|
Also adds Falcon-180B support.
Closes #3049
Co-authored-by: jb <jonathan.t.barnard@gmail.com>
|
|
|
|
|
|
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
|
|
|
|
|
|
|
|
Use local GGUF package when possible in Baichuan converter
|
|
|
|
|
|
* add refact model
* resolve comments
* rebase to the latest
* solve alibi cpu error
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* sync : ggml (conv 1d + 2d updates)
ggml-ci
* ggml : fix UB in q5_0 and q5_1 quantize code
ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
ggml-ci
* tests : fix UB in test-quantize-perf
|
|
Fix small typo
|