Age | Commit message (Collapse) | Author |
|
Co-authored-by: xaedes <xaedes@gmail.com>
|
|
|
|
|
|
|
|
* Parallel RoPE on metal
* PR suggestion
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* metal : fix kernel_norm
ggml-ci
* metal : put warning in kernel_norm to not combine the loops
* metal : restore original F16 mat-vec multiplication
It works after the norm fixes
* common : don't do warm-up with more than n_batch tokens (close #3058)
ggml-ci
* metal : minor
|
|
* llama : use posix_madvise() instead of madvise() derived from BSD
sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' llama.cpp
* ggml : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD
sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml.c
* metal : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD
sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml-metal.m
|
|
|
|
* convert-llama-ggmlv3-to-gguf: Try to handle files older than GGJTv3
* Better error messages for files that cannot be converted
* Add file type to GGUF output
* Rename to convert-llama-ggml-to-gguf.py
* Include original file type information in description
* Improve some informational output
|
|
|
|
|
|
|
|
* fix implicit int to string conversion
* convert : remove an obsolete pyright comment
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
|
|
* Guard against all weights in a super-block being zero
* Also guard against extremely small weights
Closes #2982
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
|
|
* speculative : add grammar support
* grammars : add json_arr.gbnf
* grammar : add comments to new grammar file
* grammar : remove one nested level
* common : warm-up with 2 tokens - seems to work better
* speculative : print draft token pieces
* speculative : reuse grammar parser + better logs and comments
* speculative : avoid grammar_mem
* make : fix speculative build
|
|
|
|
* build : on Mac OS enable Metal by default
* make : try to fix build on Linux
* make : move targets back to the top
* make : fix target clean
* llama : enable GPU inference by default with Metal
* llama : fix vocab_only logic when GPU is enabled
* common : better `n_gpu_layers` assignment
* readme : update Metal instructions
* make : fix merge conflict remnants
* gitignore : metal
|
|
|
|
|
|
|
|
* editorconfig: add override for the server HTML (which already is 2-space indented)
* server: add a subtle loading animation to the edit box
|
|
* 2x faster (rms) norm cuda kernels
* Fix code style
|
|
* ggml-alloc : use virtual memory for measurement
* compatibility fixes for MAP_ANONYMOUS
* fallback to fixed address for systems without virtual memory
|
|
* speculative : initial example
* speculative : print encoding speed
* speculative : add --draft CLI arg
|
|
|
|
|
|
|
|
|
|
|
|
This restores the generated text to be the same as before #2959
|
|
* update .gitignore
* makefile: add coverage support (lcov, gcovr)
* add code-coverage workflow
* update code coverage workflow
* wun on ubuntu 20.04
* use gcc-8
* check why the job hang
* add env vars
* add LLAMA_CODE_COVERAGE=1 again
* - add CODECOV_TOKEN
- add missing make lcov-report
* install lcov
* update make file -pb flag
* remove unused GGML_NITER from workflows
* wrap coverage output files in COV_TARGETS
|
|
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
|
|
* Very minor speedup via simd-group synchronization in f16 x f32
* Another very minor speedup on metal
* Quite significant PP speedup on metal
* Another attempt
* Minor
* Massive improvement for TG for fp16
* ~4-5% improvement for Q8_0 TG on metal
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* convert.py: BPE fixes?
* Remove unnecessary conditional in addl token error handling
|
|
|
|
Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
|
|
|
|
* make : remove unused -DGGML_BIG_ENDIAN
* make : put preprocessor stuff in CPPFLAGS
* make : pass Raspberry Pi arch flags to g++ as well
* make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS
* make : fix inverted conditional
|
|
* logging: Fix creating empty file even when disabled
* Minor formatting fix
Co-authored-by: staviq <staviq@gmail.com>
---------
Co-authored-by: staviq <staviq@gmail.com>
|
|
* Update Windows CLBlast instructions
* Update Windows CLBlast instructions
* Remove trailing whitespace
|
|
* ggml_metal_init: Show all Metal device instances in the system
Also show the default Metal device that was picked.
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* k-quants : fix build on armv7
* ggml : cleanup unused arm32 specific impl
* k-quants : avoid some unused vzero / mzero define
* ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm
|
|
|
|
|
|
* quick start command fix
* quick start win command fix
|
|
* Allow quantize tool to only copy tensors to allow repackaging models.
* Slightly better logic when requantizing.
* Change help message to go to `stdout`.
|
|
|