Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
* editorconfig: add override for the server HTML (which already is 2-space indented)
* server: add a subtle loading animation to the edit box
|
|
* 2x faster (rms) norm cuda kernels
* Fix code style
|
|
* ggml-alloc : use virtual memory for measurement
* compatibility fixes for MAP_ANONYMOUS
* fallback to fixed address for systems without virtual memory
|
|
* speculative : initial example
* speculative : print encoding speed
* speculative : add --draft CLI arg
|
|
|
|
|
|
|
|
|
|
|
|
This restores the generated text to be the same as before #2959
|
|
* update .gitignore
* makefile: add coverage support (lcov, gcovr)
* add code-coverage workflow
* update code coverage workflow
* wun on ubuntu 20.04
* use gcc-8
* check why the job hang
* add env vars
* add LLAMA_CODE_COVERAGE=1 again
* - add CODECOV_TOKEN
- add missing make lcov-report
* install lcov
* update make file -pb flag
* remove unused GGML_NITER from workflows
* wrap coverage output files in COV_TARGETS
|
|
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
|
|
* Very minor speedup via simd-group synchronization in f16 x f32
* Another very minor speedup on metal
* Quite significant PP speedup on metal
* Another attempt
* Minor
* Massive improvement for TG for fp16
* ~4-5% improvement for Q8_0 TG on metal
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* convert.py: BPE fixes?
* Remove unnecessary conditional in addl token error handling
|
|
|
|
Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
|
|
|
|
* make : remove unused -DGGML_BIG_ENDIAN
* make : put preprocessor stuff in CPPFLAGS
* make : pass Raspberry Pi arch flags to g++ as well
* make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS
* make : fix inverted conditional
|
|
* logging: Fix creating empty file even when disabled
* Minor formatting fix
Co-authored-by: staviq <staviq@gmail.com>
---------
Co-authored-by: staviq <staviq@gmail.com>
|
|
* Update Windows CLBlast instructions
* Update Windows CLBlast instructions
* Remove trailing whitespace
|
|
* ggml_metal_init: Show all Metal device instances in the system
Also show the default Metal device that was picked.
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* k-quants : fix build on armv7
* ggml : cleanup unused arm32 specific impl
* k-quants : avoid some unused vzero / mzero define
* ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm
|
|
|
|
|
|
* quick start command fix
* quick start win command fix
|
|
* Allow quantize tool to only copy tensors to allow repackaging models.
* Slightly better logic when requantizing.
* Change help message to go to `stdout`.
|
|
|
|
Fixes #2922
|
|
* made the methods const
# Conflicts:
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
* made method const
* Update convert-llama2c-to-ggml.cpp
removed write_raw and write_u32
* llama2c : remove misleading const
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* fix most gcc and clang warnings
* baby-llama : remove commented opt_params_adam
* fix some MinGW warnings
* fix more MinGW warnings
|
|
|
|
* added support for RISCV CFLAGS & native compile + cross compile options
* Add RISC-V Vector Intrinsics Support
Added RVV intrinsics for following
ggml_vec_dot_q4_0_q8_0
ggml_vec_dot_q4_1_q8_1
ggml_vec_dot_q5_0_q8_0
ggml_vec_dot_q5_1_q8_1
ggml_vec_dot_q8_0_q8_0
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
---------
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
|
|
|
|
* fix mingw-like builds
* formatting
* make LOG_COMPAT easier to override and extend
* simplify win detection
* fix for #2940
|
|
* llama2c : fix segfault if vocab is not found
* llama2c : fix mismatch between new[] and delete
* llama2c : fix basename on Windows
* llama2c : use a destructor to prevent memory leaks
|
|
* Somewhat faster f16 x f32 matrix multiply kernel
* Better use 32 thread groups for f16 x f32
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
|
|
|
|
* scripts: Use local gguf when running from repo
|
|
Reimplement fix for `PrefetchVirtualMemory`.
Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
|
|
* convert : fix python 3.8 support
* convert : sort imports
* convert : fix required parameters in convert-llama-ggmlv3-to-gguf
* convert : fix mypy errors in convert-llama-ggmlv3-to-gguf
* convert : use PEP 585 generics and PEP 604 unions
Now that we have `from __future__ import annotations`, we can use this
modern syntax in Python 3.7 instead of restricting support to Python 3.9
or 3.10 respectively.
* gguf.py : a tuple is already a tuple
* add mypy.ini
* convert : add necessary `type: ignore` comments
* gguf-py: bump version
|
|
|
|
* [Docker] fix tools.sh argument passing.
This should allow passing multiple arguments to containers with
the full image that are using the tools.sh frontend.
Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734
|
|
|
|
|