ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-06-17	Add support for sqrt on CUDA (#7953)	Calvin Laurenson
2024-06-16	cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)	Georgi Gerganov
2024-06-14	CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)	Johannes Gäßler
2024-06-12	CUDA: fix broken oob check for FA vec f32 kernel (#7904)	Johannes Gäßler
2024-06-12	tests : add non-cont unary tests (#7857)	Georgi Gerganov
2024-06-11	CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)	Johannes Gäßler
2024-06-10	CUDA: use tensor cores for MMQ (#7676)	Johannes Gäßler
2024-06-09	CUDA: revise q8_1 data layout for mul_mat_q (#7824)	Johannes Gäßler
2024-06-05	CUDA: refactor mmq, dmmv, mmvq (#7716)	Johannes Gäßler
2024-06-05	ggml : refactor rope norm/neox (#7634)	Georgi Gerganov
2024-06-01	Fix FlashAttention debug test, FP32 assert (#7684)	Johannes Gäßler
2024-06-01	CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)	Johannes Gäßler
2024-06-01	CUDA: quantized KV support for FA vec (#7527)	Johannes Gäßler
2024-05-29	ggml : fix YARN + add tests + add asserts (#7617)	Georgi Gerganov
2024-05-29	cuda : non-cont concat support (#7610)	Georgi Gerganov
2024-05-28	ggml : generalize GGML_OP_CONCAT (#7563)	Georgi Gerganov
2024-05-28	update HIP_UMA #7399 (#7414)	Djip007
2024-05-23	ggml : drop support for QK_K=64 (#7473)	Georgi Gerganov
2024-05-23	CUDA: fix FA out-of-bounds reads (#7479)	Johannes Gäßler
2024-05-22	CUDA: fix FA out-of-bounds writes (#7465)	Johannes Gäßler
2024-05-22	cuda : fix compile warning (#7454)	Georgi Gerganov
2024-05-22	CUDA: remove incorrect precision check (#7454)	Johannes Gäßler
2024-05-22	cuda : fix rope + add tests (#7452)	Georgi Gerganov
2024-05-21	llama : add phi3 128K model support (#7225)	liuwei-git
2024-05-21	CUDA: fix unused warning in mmq.cu (#7442)	Johannes Gäßler
2024-05-21	CUDA: deduplicate mmq code (#7397)	Johannes Gäßler
2024-05-18	CUDA: deduplicate FlashAttention code (#7352)	Johannes Gäßler
2024-05-18	cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)	Engininja2
2024-05-17	CUDA: faster large batch FA without tensor cores (#7314)	Johannes Gäßler
2024-05-15	ggml : add `ggml_upscale_ext` (ggml/814)	John Balis
2024-05-12	CUDA: add FP32 FlashAttention vector kernel (#7188)	Johannes Gäßler
2024-05-11	feat: implemented sigmoid function (ggml/806)	Justina Cho
2024-05-11	ggml : full ALiBi support (#7192)	Georgi Gerganov
2024-05-09	CUDA: generalize FP16 fattn vec kernel (#7061)	Johannes Gäßler
2024-05-08	Introduction of CUDA Graphs to LLama.cpp (#6766)	agray3
2024-05-01	CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)	Johannes Gäßler
2024-04-30	ggml : add Flash Attention (#5021)	Georgi Gerganov
2024-04-29	Fix more int overflow during quant (PPL/CUDA). (#6563)	DAN™
2024-04-18	ggml : group all experts in a single ggml_mul_mat_id (#6505)	slaren
2024-04-09	llama : add Command R Plus support (#6491)	Carolinabanana
2024-04-03	ggml : mul_mat_id use the same tensor for all the experts (#6387)	slaren
2024-03-29	sync : ggml (#6351)	Georgi Gerganov
2024-03-26	IQ1_M: 1.75 bpw quantization (#6302)	Kawrakow
2024-03-25	cuda : fix LLAMA_CUDA_F16 build (#6298)	slaren
2024-03-25	cuda : refactor into multiple files (#6269)	slaren