summaryrefslogtreecommitdiff
path: root/examples/quantize/quantize.cpp
AgeCommit message (Expand)Author
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-08-09iq6_k: WIP (quantize/dequantize)Iwan Kawrakow
2024-08-07Adding IQ2_TN for use with ternary models (#13)Kawrakow
2024-08-05q2_K: allow it to detect ternary nets and quantize accordinglyIwan Kawrakow
2024-08-01iq3_k: BasicsIwan Kawrakow
2024-08-01iq5_k: BasicsIwan Kawrakow
2024-08-01iq2_k: BasicsIwan Kawrakow
2024-07-28IQ4_K: SOTA 4-bit quantization (#6)Kawrakow
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
2024-06-24Bitnet: tiny bity faster 1.625 bpw variant on MetalIwan Kawrakow
2024-06-22bitnet: add 2 bpw quantizationIwan Kawrakow
2024-06-22bitnet: CUDA, scalar, AVX2Iwan Kawrakow
2024-05-22common : normalize naming style (#7462)Georgi Gerganov
2024-05-19quantize : fix --keep-split check (#7374)Fred Douglas
2024-05-08ggml : introduce bfloat16 support (#6412)Justine Tunney
2024-04-26quantize: add imatrix and dataset metadata in GGUF (#6658)Pierrick Hymbert
2024-04-25quantize : add '--keep-split' to quantize model into shards (#6688)jiez
2024-04-03ggml : mul_mat_id use the same tensor for all the experts (#6387)slaren
2024-03-26IQ1_M: 1.75 bpw quantization (#6302)Kawrakow
2024-03-26quantize : be able to override metadata by key (#6321)Kawrakow
2024-03-22quantize: options for output and token embedding tensors qtype (#6239)Kawrakow
2024-02-27IQ4_XS: a 4.25 bpw quantization (#5747)Kawrakow
2024-02-26Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range...Kawrakow
2024-02-24IQ3_S: a much better alternative to Q3_K (#5676)Kawrakow
2024-02-21IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)Kawrakow
2024-02-181.5 bit quantization (#5453)Kawrakow
2024-02-16ggml : add numa options (#5377)bmwl
2024-02-03refactor : switch to emplace_back to avoid extra object (#5291)Michael Klimenko
2024-01-30SOTA 3-bit quants (#5196)Kawrakow
2024-01-30quantize : fix typo (#5211)Vladimir Malyutin
2024-01-22llama : add Q3_K_XS (#5060)Kawrakow
2024-01-14Add ability to use importance matrix for all k-quants (#4930)Kawrakow
2024-01-142-bit quantizations (#4897)Kawrakow
2024-01-11llama : restore intended k-quants mixes for MoE models (#4872)Kawrakow
2023-11-02build : link against build info instead of compiling against it (#3879)cebtenzzre
2023-10-29ggml : quantization refactoring (#3833)Georgi Gerganov
2023-09-28build : enable more non-default compiler warnings (#3200)Cebtenzzre
2023-09-18make : restore build-info.h dependency for several targets (#3205)Cebtenzzre
2023-09-15examples : add compiler version and target to build info (#2998)Cebtenzzre
2023-09-15check C++ code with -Wmissing-declarations (#3184)Cebtenzzre
2023-09-07fix some warnings from gcc and clang-tidy (#3038)Cebtenzzre
2023-09-01Allow quantize to only copy tensors, some other improvements (#2931)Kerfuffle
2023-08-28quantize : make output filename optional again (#2823)Cebtenzzre
2023-08-23Fix values shown in the quantize tool help (#2735)Kawrakow