summaryrefslogtreecommitdiff
path: root/examples/llama-bench/llama-bench.cpp
AgeCommit message (Expand)Author
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
2024-06-14llama-bench : fix RPC indication (#7936)Radoslav Gerganov
2024-06-13move BLAS to a separate backend (#6210)slaren
2024-06-11llama-bench: more compact markdown tables (#7879)Johannes Gäßler
2024-06-04common : refactor cli arg parsing (#7675)Georgi Gerganov
2024-06-04ggml : remove OpenCL (#7735)Georgi Gerganov
2024-06-04llama-bench : allow using a different printer for stderr with -oe (#7722)slaren
2024-05-29llama-bench : add support for the RPC backend (#7435)Radoslav Gerganov
2024-05-22common : normalize naming style (#7462)Georgi Gerganov
2024-05-22phi3 : duplicate rope factors in each layer (#7447)slaren
2024-05-10llama-bench : add pp+tg test type (#7199)slaren
2024-05-05Adding support for the --numa argument for llama-bench. (#7080)kunnis
2024-04-30ggml : add Flash Attention (#5021)Georgi Gerganov
2024-04-16ggml : add llamafile sgemm (#6414)Justine Tunney
2024-03-26cuda : rename build flag to LLAMA_CUDA (#6299)slaren
2024-03-21Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)Kawrakow
2024-03-18backend : offload large batches to GPU (#6083)slaren
2024-03-15llama-bench : use random tokens to improve accuracy with mixtral (#6069)slaren
2024-03-14gguf : fix resource leaks (#6061)Steve Grubb
2024-03-13llama : add pipeline parallelism support (#6017)slaren
2024-03-07llama-bench : add embeddings option (#5924)Georgi Gerganov
2024-03-02Support multiple GPUs (split mode) on SYCL backend (#5806)Neo Zhang Jianyu
2024-03-01llama : cleanup unused mmq flags (#5772)Pierrick Hymbert
2024-02-25code : normalize enum names (#5697)Georgi Gerganov
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-03refactor : switch to emplace_back to avoid extra object (#5291)Michael Klimenko
2024-02-01add --no-mmap in llama-bench (#5257)Neo Zhang Jianyu
2024-01-31llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)Georgi Gerganov
2024-01-30kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)Jared Van Bortel
2024-01-28ggml : add Vulkan backend (#2059)0cc4m
2024-01-12llama : ggml-backend integration (#4766)slaren
2024-01-07llama-bench : add no-kv-offload parameter (#4812)slaren
2023-12-07llama : per-layer KV cache + quantum K cache (#4309)Georgi Gerganov
2023-11-02build : link against build info instead of compiling against it (#3879)cebtenzzre
2023-10-29Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)Kerfuffle
2023-10-23llama : remove token functions with `context` args in favor of `model` (#3720)Marcus Dunn
2023-09-28build : enable more non-default compiler warnings (#3200)Cebtenzzre
2023-09-28llama.cpp : split llama_context_params into model and context params (#3301)slaren
2023-09-28llama : custom attention mask + parallel decoding + no context swaps (#3228)Georgi Gerganov
2023-09-27metal : reusing llama.cpp logging (#3152)Rickard Hallerbäck
2023-09-15sync : ggml (Metal F32 support + reduce ggml-alloc size) (#3192)Georgi Gerganov
2023-09-07llama-bench : use two tokens in the warmup run for prompt evals (#3059)slaren
2023-09-05examples : replace fprintf to stdout with printf (#3017)Cebtenzzre
2023-09-04llama-bench : make cpp file non-executable (#2999)Cebtenzzre
2023-08-28llama-bench : set locale to utf8 (#2832)slaren
2023-08-25llama-bench : add model sizes (#2771)slaren
2023-08-25ROCm Port (#1087)Henri Vasserman
2023-08-22llama-bench : minor fixes (#2695)slaren
2023-08-21gguf : new file format with flexible meta data (beta) (#2398)Georgi Gerganov