ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-10-06	server : docs fix default values and add n_probs (#3506)	Mihai

2023-10-06	kv cache slot search improvements (#3493)	Kerfuffle
	* kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06	prompts : fix editorconfig checks after #3416	Georgi Gerganov

2023-10-06	parallel : add option to load external prompt file (#3416)	pudepiedj
	* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06	server : reuse llama_sample_token common util (#3494)	Jhen-Jie Hong
	* server : reuse llama_sample_token common function * common : use n_probs for temperature sampling
2023-10-06	llama : correct hparams comparison (#3446)	l3utterfly
	* fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06	ci : fix xcodebuild destinations (#3491)	Jhen-Jie Hong
	* ci : fix xcodebuild destinations * ci : add .swift to paths
2023-10-05	convert : update Falcon script for new HF config (#3448)	cebtenzzre
	Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05	build : use std::make_tuple() for compatibility with older GCC versions (#3488)	Kenvix ⭐

2023-10-05	common : process escape sequences in reverse prompts (#3461)	staviq

2023-10-05	CLBlast: Fix handling of on-device tensor data	shibe2
	Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.
2023-10-05	server : fix incorrect num_tokens_predicted (#3480)	Jhen-Jie Hong

2023-10-05	swift : disable ACCELERATE_NEW_LAPACK (#3481)	Jhen-Jie Hong

2023-10-05	ci : add swift build via xcodebuild (#3482)	Jhen-Jie Hong

2023-10-04	convert : fix Baichuan2 models by using vocab size in config.json (#3299)	Kerfuffle
	Use local GGUF package when possible in Baichuan converter
2023-10-04	readme : add project status link	Georgi Gerganov

2023-10-04	ggml : fix build after #3329	Georgi Gerganov

2023-10-04	llm : add Refact model (#3329)	ds5t5
	* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04	sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)	Georgi Gerganov
	* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf
2023-10-04	finetune : readme fix typo (#3465)	Merrick Christensen
	Fix small typo
2023-10-03	ggml : add RISC-V Vector Support for K-Quants and improved the existing ↵	Tameem
	intrinsics (#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03	main : consistent prefix/suffix coloring (#3425)	h-h-h-h
	* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03	llama : fix session saving/loading (#3400)	Georgi Gerganov
	* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API
2023-10-03	llama : expose model's rope_freq_scale in the API (#3418)	Alex Klinkhamer
	so it can be scaled further before creating a context.
2023-10-03	metal : alibi for arbitrary number of heads (#3426)	Jiahao Li

2023-10-03	cmake : make LLAMA_NATIVE flag actually use the instructions supported by ↵	Eve
	the processor (#3273) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03	Work on the BPE tokenizer (#3252)	goerch
	* Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-02	convert : fix vocab size when not defined in hparams (#3421)	cebtenzzre

2023-10-02	cmake : increase minimum version for add_link_options (#3444)	cebtenzzre

2023-10-02	CLBlast: Add broadcast support for matrix multiplication (#3402)	shibe2
	Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.
2023-10-02	gguf : add BERT, MPT, and GPT-J arch info (#3408)	cebtenzzre

2023-10-02	gguf : general usability improvements (#3409)	cebtenzzre

2023-10-02	cmake : make CUDA flags more similar to the Makefile (#3420)	cebtenzzre
	* cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build
2023-10-02	finetune : fix #3404 (#3437)	xaedes
	the shapes for init model of gqa models was wrong
2023-10-02	metal : set log callback before initializing (#3427)	Adrian

2023-10-02	cmake : fix transient definitions in find pkg (#3411)	bandoti

2023-10-02	docker : ignore Git files (#3314)	Kevin Ji

2023-10-02	infill : add new example + extend server API (#3296)	vvhg1
	* vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-30	ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)	slaren
	* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU
2023-09-29	llama.cpp : add documentation about rope_freq_base and scale values (#3401)	slaren
	* llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics
2023-09-29	train : fix KQ_pos allocation (#3392)	Georgi Gerganov
	* train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-29	llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)	Cebtenzzre
	* llama : enable mmap in quantize on Linux -> 31% faster * also enable mmap on Windows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-29	readme : update hot topics + model links (#3399)	BarfingLemurs

2023-09-29	readme : add link to grammars app (#3388)	Andrew Duffy
	* Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211 * Update README.md
2023-09-29	swift : fix build on xcode 15 (#3387)	Jhen-Jie Hong

2023-09-28	build : enable more non-default compiler warnings (#3200)	Cebtenzzre

2023-09-28	ggml_tensor: update the structure comments. (#3283)	Hua Jiang
	* ggml_tensor: update the structure comments. * remove semicolon Co-authored-by: slaren <slarengh@gmail.com> * Update ggml.h --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>
2023-09-28	ggml : release the requested thread pool resource (#3292)	Qu Zongfu
	* Release the requested thread pool resource * Release the requested thread pool resource 2 --------- Co-authored-by: Zongfu ZF3 Qu <quzf3@Lenovo.com>
2023-09-28	llama.cpp : split llama_context_params into model and context params (#3301)	slaren
	* llama.cpp : split llama_context_params into model and context params ggml-ci * fix metal build * fix freq_base/scale default to model value * llama-bench : keep the same model between tests when possible * move n_threads to llama_context_params, add n_threads_batch * fix mpi build * remove kv_size(), cuda scratch fixes * remove low-vram option * add n_threads_batch to system info, refactor to get_system_info() * add documentation about --threads-batch to the READMEs * llama-bench fix * main : fix rope freq/scale warning * llama.cpp : add llama_get_model common : add llama_tokenize from model * remove duplicated ctx/model functions ggml-ci * cuda : print total VRAM used
2023-09-28	ci : multithreaded builds (#3311)	Eve
	* mac and linux threads * windows * Update build.yml * Update build.yml * Update build.yml * automatically get thread count * windows syntax * try to fix freebsd * Update build.yml * Update build.yml * Update build.yml