Age | Commit message (Expand) | Author |
---|---|---|
2023-10-24 | cuda : add batched cuBLAS GEMM for faster attention (#3749) | Georgi Gerganov |
2023-10-23 | llama : remove token functions with `context` args in favor of `model` (#3720) | Marcus Dunn |
2023-10-22 | batched : add len CLI argument | Georgi Gerganov |
2023-10-18 | speculative : add tree-based sampling example (#3624) | Georgi Gerganov |
2023-10-11 | batched : add bench tool (#3545) | Georgi Gerganov |
2023-09-28 | llama.cpp : split llama_context_params into model and context params (#3301) | slaren |
2023-09-28 | llama : custom attention mask + parallel decoding + no context swaps (#3228) | Georgi Gerganov |