ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-09-07	fix some warnings from gcc and clang-tidy (#3038)	Cebtenzzre
	Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-07	make : improve test target (#3031)	Cebtenzzre

2023-09-07	make : fix CPPFLAGS (#3035)	Cebtenzzre

2023-09-07	llama-bench : use two tokens in the warmup run for prompt evals (#3059)	slaren

2023-09-07	metal : parallel RoPE on Metal (#3024)	Kawrakow
	* Parallel RoPE on metal * PR suggestion --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-07	metal : correct fix of kernel_norm (#3060)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-07	metal : fix kernel_norm (fixes Falcon on Metal) (#3057)	Georgi Gerganov
	* metal : fix kernel_norm ggml-ci * metal : put warning in kernel_norm to not combine the loops * metal : restore original F16 mat-vec multiplication It works after the norm fixes * common : don't do warm-up with more than n_batch tokens (close #3058) ggml-ci * metal : minor
2023-09-07	ggml : posixify madvise and pagesize (#3037)	Przemysław Pawełczyk
	* llama : use posix_madvise() instead of madvise() derived from BSD sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' llama.cpp * ggml : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml.c * metal : use sysconf(_SC_PAGESIZE) instead of getpagesize() derived from BSD sed -i 's,getpagesize(),sysconf(_SC_PAGESIZE),g' ggml-metal.m
2023-09-06	k-quants : fix zero-weight guard in Q6_K (ref #3040)	Georgi Gerganov

2023-09-06	convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 (#3023)	Kerfuffle
	* convert-llama-ggmlv3-to-gguf: Try to handle files older than GGJTv3 * Better error messages for files that cannot be converted * Add file type to GGUF output * Rename to convert-llama-ggml-to-gguf.py * Include original file type information in description * Improve some informational output
2023-09-05	build : add LLAMA_METAL_NDEBUG flag (#3033)	Cebtenzzre

2023-09-05	make : use new flag variables for recent changes (#3019)	Cebtenzzre

2023-09-05	examples : replace fprintf to stdout with printf (#3017)	Cebtenzzre

2023-09-05	convert: fix convert.py not working with int filename_stem (#3028)	Erik Scholz
	* fix implicit int to string conversion * convert : remove an obsolete pyright comment --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-05	Guard against all weights in a super-block being zero (#3010)	Kawrakow
	* Guard against all weights in a super-block being zero * Also guard against extremely small weights Closes #2982 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-05	llama : update logic for number of threads when using BLAS	Georgi Gerganov

2023-09-05	speculative : add grammar support (#2991)	Georgi Gerganov
	* speculative : add grammar support * grammars : add json_arr.gbnf * grammar : add comments to new grammar file * grammar : remove one nested level * common : warm-up with 2 tokens - seems to work better * speculative : print draft token pieces * speculative : reuse grammar parser + better logs and comments * speculative : avoid grammar_mem * make : fix speculative build
2023-09-04	py : minor	Georgi Gerganov

2023-09-04	build : on Mac OS enable Metal by default (#2901)	Georgi Gerganov
	* build : on Mac OS enable Metal by default * make : try to fix build on Linux * make : move targets back to the top * make : fix target clean * llama : enable GPU inference by default with Metal * llama : fix vocab_only logic when GPU is enabled * common : better `n_gpu_layers` assignment * readme : update Metal instructions * make : fix merge conflict remnants * gitignore : metal
2023-09-04	ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994)	slaren

2023-09-04	llama-bench : make cpp file non-executable (#2999)	Cebtenzzre

2023-09-04	make : add speculative example (#3003)	Leng Yue

2023-09-04	server : add a subtle loading animation to the edit box (#2466)	Aarni Koskela
	* editorconfig: add override for the server HTML (which already is 2-space indented) * server: add a subtle loading animation to the edit box
2023-09-04	2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)	Jiahao Li
	* 2x faster (rms) norm cuda kernels * Fix code style
2023-09-03	ggml-alloc : use virtual memory for measurement (#2973)	slaren
	* ggml-alloc : use virtual memory for measurement * compatibility fixes for MAP_ANONYMOUS * fallback to fixed address for systems without virtual memory
2023-09-03	speculative : PoC for speeding-up inference via speculative sampling (#2926)	Georgi Gerganov
	* speculative : initial example * speculative : print encoding speed * speculative : add --draft CLI arg
2023-09-03	perplexity : fix ETA by warming up the model with an empty run	Georgi Gerganov

2023-09-03	gguf(python): Fix special vocab handling when id < 0 (#2984)	Kerfuffle

2023-09-03	metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986)	Georgi Gerganov

2023-09-03	cov : disable comment in PRs (#2989)	Alon

2023-09-03	llama : fix bpe tokenize from byte (#2889)	opparco

2023-09-03	metal : revert 6af0bab until we fix it	Georgi Gerganov
	This restores the generated text to be the same as before #2959
2023-09-03	cov : add Code Coverage and codecov.io integration (#2928)	Alon
	* update .gitignore * makefile: add coverage support (lcov, gcovr) * add code-coverage workflow * update code coverage workflow * wun on ubuntu 20.04 * use gcc-8 * check why the job hang * add env vars * add LLAMA_CODE_COVERAGE=1 again * - add CODECOV_TOKEN - add missing make lcov-report * install lcov * update make file -pb flag * remove unused GGML_NITER from workflows * wrap coverage output files in COV_TARGETS
2023-09-03	opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955)	Wentai Zhang
	Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03	metal : more optimizations (#2959)	Kawrakow
	* Very minor speedup via simd-group synchronization in f16 x f32 * Another very minor speedup on metal * Quite significant PP speedup on metal * Another attempt * Minor * Massive improvement for TG for fp16 * ~4-5% improvement for Q8_0 TG on metal --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-03	swift : add support for k-quants (#2983)	kchro3

2023-09-03	convert.py : BPE fixes (#2938)	Kerfuffle
	* convert.py: BPE fixes? * Remove unnecessary conditional in addl token error handling
2023-09-03	docs : add `catai` to `README.md` (#2967)	Ido S

2023-09-03	examples : fix gpt-neox (#2943)	momonga
	Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
2023-09-03	swift : add missing c file to Package.swift (#2978)	kchro3

2023-09-03	make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886)	Cebtenzzre
	* make : remove unused -DGGML_BIG_ENDIAN * make : put preprocessor stuff in CPPFLAGS * make : pass Raspberry Pi arch flags to g++ as well * make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS * make : fix inverted conditional
2023-09-02	logging: Fix creating empty file even when disabled (#2966)	Kerfuffle
	* logging: Fix creating empty file even when disabled * Minor formatting fix Co-authored-by: staviq <staviq@gmail.com> --------- Co-authored-by: staviq <staviq@gmail.com>
2023-09-02	readme : update clblast instructions (#2903)	bandoti
	* Update Windows CLBlast instructions * Update Windows CLBlast instructions * Remove trailing whitespace
2023-09-02	metal : show all Metal device instances in the system (#2952)	Karsten Weiss
	* ggml_metal_init: Show all Metal device instances in the system Also show the default Metal device that was picked. * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-02	k-quants : fix build on armv7 (android only) (#2920)	Jhen-Jie Hong
	* k-quants : fix build on armv7 * ggml : cleanup unused arm32 specific impl * k-quants : avoid some unused vzero / mzero define * ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm
2023-09-02	server : avoid aniprompt in probabilities of final response (#2849)	Jhen-Jie Hong

2023-09-01	cuda : vsubss4 for older versions of ROCm/clang (#2942)	Engininja2

2023-09-01	readme : quick start command fix (#2908)	ZHAOKAI WANG
	* quick start command fix * quick start win command fix
2023-09-01	Allow quantize to only copy tensors, some other improvements (#2931)	Kerfuffle
	* Allow quantize tool to only copy tensors to allow repackaging models. * Slightly better logic when requantizing. * Change help message to go to `stdout`.
2023-09-01	llama2c : rename function	Georgi Gerganov