ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2023-12-07	sync : ggml (new ops, tests, backend, etc.) (#4359)	Georgi Gerganov
2023-12-07	llama : per-layer KV cache + quantum K cache (#4309)	Georgi Gerganov
2023-12-01	ggml : add ggml_soft_max_ext (#4256)	Georgi Gerganov
2023-11-24	ggml-cuda : support stablelm rope (#4156)	slaren
2023-11-23	Fix incorrect format strings and uninitialized variables. (#4133)	Haohui Mai
2023-11-18	Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)	Kerfuffle
2023-11-17	cuda : get_row_rounding F32 (#4095)	Andrew Godfrey
2023-11-17	llama : fix data units (#4101)	Georgi Gerganov
2023-11-15	ggml-cuda : increase max graph size (#4084)	slaren
2023-11-13	ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)	Georgi Gerganov
2023-11-13	sync : ggml (backend v2) (#3912)	Georgi Gerganov
2023-11-13	Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)	Kerfuffle
2023-11-07	cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)	Meng Zhang
2023-11-05	ggml-cuda : fix f16 mul mat (#3961)	slaren
2023-11-05	cuda : fix disabling device with --tensor-split 1,0 (#3951)	Jared Van Bortel
2023-11-05	cuda : revert CUDA pool stuff (#3944)	slaren
2023-11-03	ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)	slaren
2023-11-02	cuda : add ROCM aliases for CUDA pool stuff (#3918)	Kerfuffle
2023-11-02	cuda : fix const ptrs warning causing ROCm build issues (#3913)	Georgi Gerganov
2023-11-02	cuda : use CUDA memory pool with async memory allocation/deallocation when av...	Oleksii Maryshchenko
2023-11-02	cuda : check if this fixes Pascal card regression (#3882)	Georgi Gerganov
2023-11-02	cuda : fix RoPE after #2268 (#3897)	cebtenzzre
2023-11-01	ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)	slaren
2023-11-01	llama : implement YaRN RoPE scaling (#2268)	cebtenzzre
2023-11-01	finetune : add -ngl parameter (#3762)	Andrew Godfrey
2023-10-27	cuda : improve text-generation and batched decoding performance (#3776)	Georgi Gerganov
2023-10-25	batched-bench : print params at start	Georgi Gerganov
2023-10-24	sync : ggml (conv ops + cuda MSVC fixes) (#3765)	Georgi Gerganov
2023-10-24	cuda : add batched cuBLAS GEMM for faster attention (#3749)	Georgi Gerganov
2023-10-10	llm : add MPT support (#3417)	Jan Ploski
2023-10-08	sync : ggml (ggml-backend) (#3548)	Georgi Gerganov
2023-09-30	ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)	slaren
2023-09-28	llama.cpp : split llama_context_params into model and context params (#3301)	slaren
2023-09-28	llama : custom attention mask + parallel decoding + no context swaps (#3228)	Georgi Gerganov
2023-09-28	ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370)	slaren
2023-09-17	CUDA: fix peer access logic (#3231)	Johannes Gäßler
2023-09-17	CUDA: enable peer access between devices (#2470)	Johannes Gäßler
2023-09-17	CUDA: fix scratch malloced on non-main device (#3220)	Johannes Gäßler
2023-09-16	Enable build with CUDA 11.0 (make) (#3132)	Vlad
2023-09-13	CUDA: mul_mat_q RDNA2 tunings (#2910)	Johannes Gäßler
2023-09-13	CUDA: fix LoRAs (#3130)	Johannes Gäßler
2023-09-11	CUDA: fix mul_mat_q not used for output tensor (#3127)	Johannes Gäßler
2023-09-11	CUDA: lower GPU latency + fix Windows performance (#3110)	Johannes Gäßler
2023-09-11	CUDA: add device number to error messages (#3112)	Johannes Gäßler
2023-09-08	sync : ggml (CUDA GLM RoPE + POSIX) (#3082)	Georgi Gerganov
2023-09-04	2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)	Jiahao Li
2023-09-01	cuda : vsubss4 for older versions of ROCm/clang (#2942)	Engininja2
2023-08-28	CUDA: fix RoPE asserts, block sizes (#2833)	Johannes Gäßler
2023-08-27	falcon : fix CUDA inference by making K and Q contiguous (#2830)	Georgi Gerganov
2023-08-27	k_quants tuning for Falcon-7b (#2816)	Kawrakow