ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Expand)	Author
2023-10-10	llm : add MPT support (#3417)	Jan Ploski
2023-10-08	sync : ggml (ggml-backend) (#3548)	Georgi Gerganov
2023-09-30	ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)	slaren
2023-09-28	llama.cpp : split llama_context_params into model and context params (#3301)	slaren
2023-09-28	llama : custom attention mask + parallel decoding + no context swaps (#3228)	Georgi Gerganov
2023-09-28	ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370)	slaren
2023-09-17	CUDA: fix peer access logic (#3231)	Johannes Gäßler
2023-09-17	CUDA: enable peer access between devices (#2470)	Johannes Gäßler
2023-09-17	CUDA: fix scratch malloced on non-main device (#3220)	Johannes Gäßler
2023-09-16	Enable build with CUDA 11.0 (make) (#3132)	Vlad
2023-09-13	CUDA: mul_mat_q RDNA2 tunings (#2910)	Johannes Gäßler
2023-09-13	CUDA: fix LoRAs (#3130)	Johannes Gäßler
2023-09-11	CUDA: fix mul_mat_q not used for output tensor (#3127)	Johannes Gäßler
2023-09-11	CUDA: lower GPU latency + fix Windows performance (#3110)	Johannes Gäßler
2023-09-11	CUDA: add device number to error messages (#3112)	Johannes Gäßler
2023-09-08	sync : ggml (CUDA GLM RoPE + POSIX) (#3082)	Georgi Gerganov
2023-09-04	2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)	Jiahao Li
2023-09-01	cuda : vsubss4 for older versions of ROCm/clang (#2942)	Engininja2
2023-08-28	CUDA: fix RoPE asserts, block sizes (#2833)	Johannes Gäßler
2023-08-27	falcon : fix CUDA inference by making K and Q contiguous (#2830)	Georgi Gerganov
2023-08-27	k_quants tuning for Falcon-7b (#2816)	Kawrakow
2023-08-25	ROCm Port (#1087)	Henri Vasserman
2023-08-25	cuda : add RoPE kernel for mode == 2 (NeoX) (#2760)	Georgi Gerganov
2023-08-23	llm : add Falcon support (#2717)	Georgi Gerganov
2023-08-22	CUDA: use mul_mat_q kernels by default (#2683)	Johannes Gäßler
2023-08-22	Fix CUDA softmax by subtracting max value before exp (#2665)	Jiahao Li
2023-08-22	ggml-cuda : use graph allocator (#2684)	slaren
2023-08-22	ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709)	Georgi Gerganov
2023-08-18	llama : add benchmark example (#2626)	slaren
2023-08-14	CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596)	Johannes Gäßler
2023-08-13	CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590)	Johannes Gäßler
2023-08-09	CUDA: tuned mul_mat_q kernels (#2546)	Johannes Gäßler
2023-08-05	CUDA: faster k-quant mul_mat_q kernels (#2525)	Johannes Gäßler
2023-08-04	CUDA: use min compute capability of GPUs actually used (#2506)	Cebtenzzre
2023-08-04	CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)	Cebtenzzre
2023-08-02	CUDA: faster non k-quant mul_mat_q kernels (#2483)	Johannes Gäßler
2023-08-02	CUDA: Fix models with output size != 32000 (#2480)	Johannes Gäßler
2023-07-31	CUDA: mmq CLI option, fixed mmq build issues (#2453)	Johannes Gäßler
2023-07-31	CUDA: Implemented row flattening for non-glm RoPE (#2468)	Johannes Gäßler
2023-07-31	CUDA: fewer memory bank conflicts for mul_mat_q (#2458)	Johannes Gäßler
2023-07-29	CUDA: Quantized matrix matrix multiplication (#2160)	Johannes Gäßler
2023-07-29	CUDA: faster multi GPU synchronization (#2448)	Johannes Gäßler
2023-07-25	Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359)	Kawrakow
2023-07-24	make rms_norm_eps a parameter (#2374)	slaren
2023-07-24	ggml : sync (unary ops refactor, static-correctness) (#2370)	Georgi Gerganov
2023-07-24	Some more Q4_K and Q5_K speedup on CUDA (#2346)	Kawrakow
2023-07-23	ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)	slaren
2023-07-23	llama : grouped-query attention + LLaMAv2 70B support (#2276)	Georgi Gerganov
2023-07-23	Speed up Q4_K (#2322)	Kawrakow
2023-07-22	CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313)	Johannes Gäßler