ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2024-06-22	Bitnet(1.75 bpw): higher precision fp8 scale	Iwan Kawrakow
2024-06-22	Bitnet(2.25 bpw): CUDA	Iwan Kawrakow
2024-06-22	bitnet: CUDA, scalar, AVX2	Iwan Kawrakow
2024-06-20	CUDA: stream-k decomposition for MMQ (#8018)	Johannes Gäßler
2024-06-14	CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)	Johannes Gäßler
2024-06-10	CUDA: use tensor cores for MMQ (#7676)	Johannes Gäßler
2024-06-05	CUDA: refactor mmq, dmmv, mmvq (#7716)	Johannes Gäßler
2024-05-28	update HIP_UMA #7399 (#7414)	Djip007
2024-05-18	CUDA: deduplicate FlashAttention code (#7352)	Johannes Gäßler
2024-05-18	cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)	Engininja2
2024-05-12	CUDA: add FP32 FlashAttention vector kernel (#7188)	Johannes Gäßler
2024-05-09	CUDA: generalize FP16 fattn vec kernel (#7061)	Johannes Gäßler
2024-05-08	Introduction of CUDA Graphs to LLama.cpp (#6766)	agray3
2024-05-01	CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)	Johannes Gäßler
2024-04-30	ggml : add Flash Attention (#5021)	Georgi Gerganov
2024-04-09	llama : add Command R Plus support (#6491)	Carolinabanana
2024-03-29	sync : ggml (#6351)	Georgi Gerganov
2024-03-25	cuda : refactor into multiple files (#6269)	slaren