index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
examples
Age
Commit message (
Expand
)
Author
2024-12-03
Q8_0_R4 (#120)
Kawrakow
2024-12-02
Q4_0_R4 (#119)
Kawrakow
2024-12-02
IQ4_NL_X4 (#118)
Kawrakow
2024-10-25
Bitnet changes (#106)
Kawrakow
2024-10-18
CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)
Nexes the Elder
2024-10-16
Adding IQ4_KSS: 4.0 bpw quants (#89)
Kawrakow
2024-10-13
IQ2_KS: 2.1875 bpw non-linear quantization (#85)
Kawrakow
2024-10-10
Better model info (#84)
Kawrakow
2024-10-09
New SOTA quantization: 4.25 bpw IQ4_KS (#83)
Kawrakow
2024-10-02
Adding Q6_0 (#77)
Kawrakow
2024-09-27
Adding ability to have meta data per tensor row (#61)
Kawrakow
2024-09-09
Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)
Kawrakow
2024-09-05
Zen4 Flash Attention - bf16 support (#38)
Kawrakow
2024-08-20
Fused soft cap and SIMD-ified GeLU (#9)
Kawrakow
2024-08-19
quantize_stats: print rmse and max error as fraction of <x> (#21)
Kawrakow
2024-08-12
Merge mainline - Aug 12 2024 (#17)
Kawrakow
2024-08-09
iq6_k: WIP (quantize/dequantize)
Iwan Kawrakow
2024-08-07
Adding IQ2_TN for use with ternary models (#13)
Kawrakow
2024-08-05
q2_K: allow it to detect ternary nets and quantize accordingly
Iwan Kawrakow
2024-08-01
iq3_k: Basics
Iwan Kawrakow
2024-08-01
iq5_k: Basics
Iwan Kawrakow
2024-08-01
iq2_k: Basics
Iwan Kawrakow
2024-07-28
IQ4_K: SOTA 4-bit quantization (#6)
Kawrakow
2024-07-27
Merge mainline llama.cpp (#3)
Kawrakow
2024-07-24
Add copyright notices
Iwan Kawrakow
2024-06-26
imatrix: be able to specify the name of the output tensor
Iwan Kawrakow
2024-06-24
Bitnet: tiny bity faster 1.625 bpw variant on Metal
Iwan Kawrakow
2024-06-22
bitnet: add 2 bpw quantization
Iwan Kawrakow
2024-06-22
bitnet: CUDA, scalar, AVX2
Iwan Kawrakow
2024-06-21
llama : allow pooled embeddings on any model (#7477)
Douglas Hanley
2024-06-21
swiftui : enable stream updating (#7754)
Shuichi Tsutsumi
2024-06-20
[SYCL] Fix windows build and inference (#8003)
luoyu-intel
2024-06-20
server : fix smart slot selection (#8020)
sasha0552
2024-06-18
Only use FIM middle token if it exists (#7648)
Sigbjørn Skjæret
2024-06-17
Add support for sqrt on CUDA (#7953)
Calvin Laurenson
2024-06-15
Add `cvector-generator` example (#7514)
Xuan Son Nguyen
2024-06-14
llama-bench : fix RPC indication (#7936)
Radoslav Gerganov
2024-06-13
move BLAS to a separate backend (#6210)
slaren
2024-06-13
`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...
Olivier Chafik
2024-06-12
server : restore numeric prompts (#7883)
Georgi Gerganov
2024-06-11
llama-bench: more compact markdown tables (#7879)
Johannes Gäßler
2024-06-11
json: refine constraint for whitespace to avoid runaways yet allow pretty pri...
Olivier Chafik
2024-06-11
`json`: document schema conversion in GBNF readme, align manual grammar examp...
Olivier Chafik
2024-06-10
examples : remove --instruct remnants (#7846)
Georgi Gerganov
2024-06-10
server : improve "prompt" handling (#7847)
Georgi Gerganov
2024-06-09
imatrix : handle partial entries (#7833)
Georgi Gerganov
2024-06-09
server: do not remove whitespace at the start of a completion chunk (#7830)
mgroeber9110
2024-06-09
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
slaren
2024-06-08
server : smart slot selection using Longest Common Prefix (#7728)
sasha0552
2024-06-07
gguf-split : change binary multi-byte units to decimal (#7803)
Christian Zhou-Zheng
[next]