| Age | Commit message (Expand) | Author |
| 2024-04-24 | llama : add phi 3 chat template (#6857) | Tristan Druyen |
| 2024-04-21 | llama : add llama-3 chat template (#6751) | Wouter |
| 2024-04-18 | ggml : group all experts in a single ggml_mul_mat_id (#6505) | slaren |
| 2024-04-16 | llama : add qwen2moe (#6074) | Shijie |
| 2024-04-15 | `main`: add --json-schema / -j flag (#6659) | Olivier Chafik |
| 2024-04-14 | Add Command R chat template (#6650) | Chao Jiang |
| 2024-04-12 | JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings,... | Olivier Chafik |
| 2024-04-12 | metal : unify mul_mv_id kernels (#6556) | slaren |
| 2024-04-11 | grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses... | Olivier Chafik |
| 2024-04-06 | Tests: Added integration tests for GBNF parser (#6472) | Clint Herron |
| 2024-04-03 | Add OpenChat, Alpaca, Vicuna chat templates (#6397) | kaizau |
| 2024-04-03 | ggml : mul_mat_id use the same tensor for all the experts (#6387) | slaren |
| 2024-03-26 | IQ1_M: 1.75 bpw quantization (#6302) | Kawrakow |
| 2024-03-25 | tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) | Kawrakow |
| 2024-03-22 | tests : conditional python & node json schema tests (#6207) | Olivier Chafik |
| 2024-03-22 | json-schema-to-grammar : fix order of props + non-str const/enum (#6232) | Olivier Chafik |
| 2024-03-22 | metal : pad n_ctx by 32 (#6177) | Georgi Gerganov |
| 2024-03-21 | tests : disable system() calls (#6198) | Georgi Gerganov |
| 2024-03-21 | json-schema-to-grammar improvements (+ added to server) (#5978) | Olivier Chafik |
| 2024-03-15 | llama : add Orion chat template (#6066) | Xuan Son Nguyen |
| 2024-03-13 | test-backend-ops : skip CPU backend by default (#6028) | slaren |
| 2024-03-11 | llama : refactor unicode stuff (#5992) | Georgi Gerganov |
| 2024-03-09 | ggml : remove old quantization functions (#5942) | Georgi Gerganov |
| 2024-03-09 | tests : gitignore ggml-common.h | Georgi Gerganov |
| 2024-03-04 | add some new ops, fix some operators and add batch operations to certain oper... | leejet |
| 2024-02-27 | IQ4_XS: a 4.25 bpw quantization (#5747) | Kawrakow |
| 2024-02-26 | Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range... | Kawrakow |
| 2024-02-25 | code : normalize enum names (#5697) | Georgi Gerganov |
| 2024-02-24 | IQ3_S: a much better alternative to Q3_K (#5676) | Kawrakow |
| 2024-02-22 | Add Gemma chat template (#5665) | Xuan Son Nguyen |
| 2024-02-22 | server : fallback to chatml, add AlphaMonarch chat template (#5628) | Xuan Son Nguyen |
| 2024-02-21 | IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590) | Kawrakow |
| 2024-02-19 | llama : add llama_chat_apply_template() (#5538) | Xuan Son Nguyen |
| 2024-02-18 | ggml, common, examples, tests : fixed type arguments in printf (#5528) | Herman Semenov |
| 2024-02-18 | 1.5 bit quantization (#5453) | Kawrakow |
| 2024-02-17 | ggml : add ALiBi support for ggml_soft_max_ext (#5488) | Georgi Gerganov |
| 2024-02-16 | ggml : add numa options (#5377) | bmwl |
| 2024-02-13 | tests : multi-thread the tokenizer tests (#5474) | Georgi Gerganov |
| 2024-02-13 | tests : disable moe test (#5473) | Georgi Gerganov |
| 2024-02-11 | ggml : add mmla kernels for quantized GEMM (#4966) | snadampal |
| 2024-02-08 | sampling: fix top_k <= 0 (#5388) | Johannes Gäßler |
| 2024-02-08 | tests : .gitignore obj files | Georgi Gerganov |
| 2024-02-03 | refactor : switch to emplace_back to avoid extra object (#5291) | Michael Klimenko |
| 2024-01-31 | llava : add MobileVLM support (#5132) | JidongZhang-THU |
| 2024-01-30 | `ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686) | John Balis |
| 2024-01-30 | SOTA 3-bit quants (#5196) | Kawrakow |
| 2024-01-29 | Nomic Vulkan backend (#4456) | Jared Van Bortel |
| 2024-01-28 | ggml : add unified SYCL backend for Intel GPUs (#2690) | Abhilash Majumder |
| 2024-01-28 | Tests for min_p, sampling queue (#5147) | Johannes Gäßler |
| 2024-01-27 | Remove unused data and add fixes (#5154) | Michael Klimenko |