index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
examples
/
llama-bench
/
llama-bench.cpp
Age
Commit message (
Expand
)
Author
2025-02-10
Load all MoE experts during warmup and make warmup 1 token (#198)
saood06
2025-02-09
Add optional MLA (#188)
Kawrakow
2025-01-30
Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182)
Kawrakow
2025-01-29
Various (#181)
Kawrakow
2024-12-17
Slightly better matrix x vector on Zen4/AVX2 for iq2_k_r4, iq3_k_r4, iq4_k_r4...
Kawrakow
2024-12-17
Be able to repack tensors at run time (#147)
Kawrakow
2024-10-02
Adding Q6_0 (#77)
Kawrakow
2024-09-05
Zen4 Flash Attention - bf16 support (#38)
Kawrakow
2024-08-20
Fused soft cap and SIMD-ified GeLU (#9)
Kawrakow
2024-08-12
Merge mainline - Aug 12 2024 (#17)
Kawrakow
2024-07-27
Merge mainline llama.cpp (#3)
Kawrakow
2024-06-14
llama-bench : fix RPC indication (#7936)
Radoslav Gerganov
2024-06-13
move BLAS to a separate backend (#6210)
slaren
2024-06-11
llama-bench: more compact markdown tables (#7879)
Johannes Gäßler
2024-06-04
common : refactor cli arg parsing (#7675)
Georgi Gerganov
2024-06-04
ggml : remove OpenCL (#7735)
Georgi Gerganov
2024-06-04
llama-bench : allow using a different printer for stderr with -oe (#7722)
slaren
2024-05-29
llama-bench : add support for the RPC backend (#7435)
Radoslav Gerganov
2024-05-22
common : normalize naming style (#7462)
Georgi Gerganov
2024-05-22
phi3 : duplicate rope factors in each layer (#7447)
slaren
2024-05-10
llama-bench : add pp+tg test type (#7199)
slaren
2024-05-05
Adding support for the --numa argument for llama-bench. (#7080)
kunnis
2024-04-30
ggml : add Flash Attention (#5021)
Georgi Gerganov
2024-04-16
ggml : add llamafile sgemm (#6414)
Justine Tunney
2024-03-26
cuda : rename build flag to LLAMA_CUDA (#6299)
slaren
2024-03-21
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
Kawrakow
2024-03-18
backend : offload large batches to GPU (#6083)
slaren
2024-03-15
llama-bench : use random tokens to improve accuracy with mixtral (#6069)
slaren
2024-03-14
gguf : fix resource leaks (#6061)
Steve Grubb
2024-03-13
llama : add pipeline parallelism support (#6017)
slaren
2024-03-07
llama-bench : add embeddings option (#5924)
Georgi Gerganov
2024-03-02
Support multiple GPUs (split mode) on SYCL backend (#5806)
Neo Zhang Jianyu
2024-03-01
llama : cleanup unused mmq flags (#5772)
Pierrick Hymbert
2024-02-25
code : normalize enum names (#5697)
Georgi Gerganov
2024-02-16
ggml : add numa options (#5377)
bmwl
2024-02-03
refactor : switch to emplace_back to avoid extra object (#5291)
Michael Klimenko
2024-02-01
add --no-mmap in llama-bench (#5257)
Neo Zhang Jianyu
2024-01-31
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
Georgi Gerganov
2024-01-30
kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)
Jared Van Bortel
2024-01-28
ggml : add Vulkan backend (#2059)
0cc4m
2024-01-12
llama : ggml-backend integration (#4766)
slaren
2024-01-07
llama-bench : add no-kv-offload parameter (#4812)
slaren
2023-12-07
llama : per-layer KV cache + quantum K cache (#4309)
Georgi Gerganov
2023-11-02
build : link against build info instead of compiling against it (#3879)
cebtenzzre
2023-10-29
Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)
Kerfuffle
2023-10-23
llama : remove token functions with `context` args in favor of `model` (#3720)
Marcus Dunn
2023-09-28
build : enable more non-default compiler warnings (#3200)
Cebtenzzre
2023-09-28
llama.cpp : split llama_context_params into model and context params (#3301)
slaren
2023-09-28
llama : custom attention mask + parallel decoding + no context swaps (#3228)
Georgi Gerganov
2023-09-27
metal : reusing llama.cpp logging (#3152)
Rickard Hallerbäck
[next]