summaryrefslogtreecommitdiff
path: root/examples
AgeCommit message (Expand)Author
2024-12-03Q8_0_R4 (#120)Kawrakow
2024-12-02Q4_0_R4 (#119)Kawrakow
2024-12-02IQ4_NL_X4 (#118)Kawrakow
2024-10-25Bitnet changes (#106)Kawrakow
2024-10-18CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)Nexes the Elder
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-10Better model info (#84)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-09-05Zen4 Flash Attention - bf16 support (#38)Kawrakow
2024-08-20Fused soft cap and SIMD-ified GeLU (#9)Kawrakow
2024-08-19quantize_stats: print rmse and max error as fraction of <x> (#21)Kawrakow
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-08-09iq6_k: WIP (quantize/dequantize)Iwan Kawrakow
2024-08-07Adding IQ2_TN for use with ternary models (#13)Kawrakow
2024-08-05q2_K: allow it to detect ternary nets and quantize accordinglyIwan Kawrakow
2024-08-01iq3_k: BasicsIwan Kawrakow
2024-08-01iq5_k: BasicsIwan Kawrakow
2024-08-01iq2_k: BasicsIwan Kawrakow
2024-07-28IQ4_K: SOTA 4-bit quantization (#6)Kawrakow
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
2024-07-24Add copyright noticesIwan Kawrakow
2024-06-26imatrix: be able to specify the name of the output tensorIwan Kawrakow
2024-06-24Bitnet: tiny bity faster 1.625 bpw variant on MetalIwan Kawrakow
2024-06-22bitnet: add 2 bpw quantizationIwan Kawrakow
2024-06-22bitnet: CUDA, scalar, AVX2Iwan Kawrakow
2024-06-21llama : allow pooled embeddings on any model (#7477)Douglas Hanley
2024-06-21swiftui : enable stream updating (#7754)Shuichi Tsutsumi
2024-06-20[SYCL] Fix windows build and inference (#8003)luoyu-intel
2024-06-20server : fix smart slot selection (#8020)sasha0552
2024-06-18Only use FIM middle token if it exists (#7648)Sigbjørn Skjæret
2024-06-17Add support for sqrt on CUDA (#7953)Calvin Laurenson
2024-06-15Add `cvector-generator` example (#7514)Xuan Son Nguyen
2024-06-14llama-bench : fix RPC indication (#7936)Radoslav Gerganov
2024-06-13move BLAS to a separate backend (#6210)slaren
2024-06-13`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...Olivier Chafik
2024-06-12server : restore numeric prompts (#7883)Georgi Gerganov
2024-06-11llama-bench: more compact markdown tables (#7879)Johannes Gäßler
2024-06-11json: refine constraint for whitespace to avoid runaways yet allow pretty pri...Olivier Chafik
2024-06-11`json`: document schema conversion in GBNF readme, align manual grammar examp...Olivier Chafik
2024-06-10examples : remove --instruct remnants (#7846)Georgi Gerganov
2024-06-10server : improve "prompt" handling (#7847)Georgi Gerganov
2024-06-09imatrix : handle partial entries (#7833)Georgi Gerganov
2024-06-09server: do not remove whitespace at the start of a completion chunk (#7830)mgroeber9110
2024-06-09Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)slaren
2024-06-08server : smart slot selection using Longest Common Prefix (#7728)sasha0552
2024-06-07gguf-split : change binary multi-byte units to decimal (#7803)Christian Zhou-Zheng