| Age | Commit message (Expand) | Author |
| 2023-08-22 | CUDA: use mul_mat_q kernels by default (#2683) | Johannes Gäßler |
| 2023-08-22 | Fix CUDA softmax by subtracting max value before exp (#2665) | Jiahao Li |
| 2023-08-22 | ggml-cuda : use graph allocator (#2684) | slaren |
| 2023-08-22 | ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) | Georgi Gerganov |
| 2023-08-18 | llama : add benchmark example (#2626) | slaren |
| 2023-08-14 | CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596) | Johannes Gäßler |
| 2023-08-13 | CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) | Johannes Gäßler |
| 2023-08-09 | CUDA: tuned mul_mat_q kernels (#2546) | Johannes Gäßler |
| 2023-08-05 | CUDA: faster k-quant mul_mat_q kernels (#2525) | Johannes Gäßler |
| 2023-08-04 | CUDA: use min compute capability of GPUs actually used (#2506) | Cebtenzzre |
| 2023-08-04 | CUDA: check if event is NULL before cudaStreamWaitEvent (#2505) | Cebtenzzre |
| 2023-08-02 | CUDA: faster non k-quant mul_mat_q kernels (#2483) | Johannes Gäßler |
| 2023-08-02 | CUDA: Fix models with output size != 32000 (#2480) | Johannes Gäßler |
| 2023-07-31 | CUDA: mmq CLI option, fixed mmq build issues (#2453) | Johannes Gäßler |
| 2023-07-31 | CUDA: Implemented row flattening for non-glm RoPE (#2468) | Johannes Gäßler |
| 2023-07-31 | CUDA: fewer memory bank conflicts for mul_mat_q (#2458) | Johannes Gäßler |
| 2023-07-29 | CUDA: Quantized matrix matrix multiplication (#2160) | Johannes Gäßler |
| 2023-07-29 | CUDA: faster multi GPU synchronization (#2448) | Johannes Gäßler |
| 2023-07-25 | Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) | Kawrakow |
| 2023-07-24 | make rms_norm_eps a parameter (#2374) | slaren |
| 2023-07-24 | ggml : sync (unary ops refactor, static-correctness) (#2370) | Georgi Gerganov |
| 2023-07-24 | Some more Q4_K and Q5_K speedup on CUDA (#2346) | Kawrakow |
| 2023-07-23 | ggml: move op parameters from tensors to ggml_tensor::op_params (#2333) | slaren |
| 2023-07-23 | llama : grouped-query attention + LLaMAv2 70B support (#2276) | Georgi Gerganov |
| 2023-07-23 | Speed up Q4_K (#2322) | Kawrakow |
| 2023-07-22 | CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) | Johannes Gäßler |
| 2023-07-21 | Custom RoPE + bettter memory management for CUDA (#2295) | Kawrakow |
| 2023-07-21 | llama : make tensor_split ptr instead of array (#2272) | Georgi Gerganov |
| 2023-07-17 | Support dup & cont ops on CUDA (#2242) | Jiahao Li |
| 2023-07-14 | cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer ... | Bach Le |
| 2023-07-14 | cuda : support broadcast add & mul (#2192) | Jiahao Li |
| 2023-07-14 | CUDA: mul_mat_vec_q kernels for k-quants (#2203) | Johannes Gäßler |
| 2023-07-14 | ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) | Georgi Gerganov |
| 2023-07-13 | Fix compile error on Windows CUDA (#2207) | Howard Su |
| 2023-07-12 | cuda : add gelu support | Georgi Gerganov |
| 2023-07-12 | Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) | Johannes Gäßler |
| 2023-07-12 | ggml : revert CUDA broadcast changes from #2183 (#2191) | Georgi Gerganov |
| 2023-07-11 | ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) | Georgi Gerganov |
| 2023-07-11 | ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) | Spencer Sutton |
| 2023-07-08 | Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) | Johannes Gäßler |
| 2023-07-08 | CUDA: add __restrict__ to mul mat vec kernels (#2140) | Johannes Gäßler |
| 2023-07-05 | Quantized dot products for CUDA mul mat vec (#2067) | Johannes Gäßler |
| 2023-07-03 | Fix crash of test-tokenizer-0 under Debug build (#2064) | Howard Su |
| 2023-07-01 | Better CUDA synchronization logic (#2057) | Johannes Gäßler |
| 2023-06-28 | cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) | Salvador E. Tropea |
| 2023-06-28 | cuda : fix missing const qualifier in casts (#2027) | Salvador E. Tropea |
| 2023-06-28 | CUDA GPU acceleration for LoRAs + f16 models (#1970) | Johannes Gäßler |
| 2023-06-26 | k-quants : support for super-block size of 64 (#2001) | Kawrakow |
| 2023-06-26 | Fix assert when free invalid cuda pointer (#2005) | Howard Su |
| 2023-06-24 | #1869 Fix null reference errors when training from scratch with CUDA (#1907) | Robyn |