summaryrefslogtreecommitdiff
path: root/examples/quantize/quantize.cpp
AgeCommit message (Expand)Author
2024-12-17IQ2_K_R4 (#146)Kawrakow
2024-12-17IQ3_K_R4 (#145)Kawrakow
2024-12-15BF16_R16 - 16 interleaved bf16 rows (#142)Kawrakow
2024-12-14Q8_K_R8: Fastest quantized matrix multiplications (#141)Kawrakow
2024-12-12IQ4_K_R4 (#138)Kawrakow
2024-12-11Q2_K_R4 (#136)Kawrakow
2024-12-11Q3_K_R4 (#134)Kawrakow
2024-12-10Q5_K_R4 (#132)Kawrakow
2024-12-10Q6_K_R4 (#130)Kawrakow
2024-12-09Q4_K_R4 (#129)Kawrakow
2024-12-08Rename iq4_nl_x4 to iq4_nl_r4 (#126)Kawrakow
2024-12-06iq2_bn_r4: fastest Bitnet CPU implementation on the planet (#124)Kawrakow
2024-12-04IQ4_XS_R4 (#123)Kawrakow
2024-12-03Q6_0_R4 (#122)Kawrakow
2024-12-03Q5_0_R4 (#121)Kawrakow
2024-12-03Q8_0_R4 (#120)Kawrakow
2024-12-02Q4_0_R4 (#119)Kawrakow
2024-12-02IQ4_NL_X4 (#118)Kawrakow
2024-10-25Bitnet changes (#106)Kawrakow
2024-10-18CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)Nexes the Elder
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-08-09iq6_k: WIP (quantize/dequantize)Iwan Kawrakow
2024-08-07Adding IQ2_TN for use with ternary models (#13)Kawrakow
2024-08-05q2_K: allow it to detect ternary nets and quantize accordinglyIwan Kawrakow
2024-08-01iq3_k: BasicsIwan Kawrakow
2024-08-01iq5_k: BasicsIwan Kawrakow
2024-08-01iq2_k: BasicsIwan Kawrakow
2024-07-28IQ4_K: SOTA 4-bit quantization (#6)Kawrakow
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
2024-06-24Bitnet: tiny bity faster 1.625 bpw variant on MetalIwan Kawrakow
2024-06-22bitnet: add 2 bpw quantizationIwan Kawrakow
2024-06-22bitnet: CUDA, scalar, AVX2Iwan Kawrakow
2024-05-22common : normalize naming style (#7462)Georgi Gerganov
2024-05-19quantize : fix --keep-split check (#7374)Fred Douglas
2024-05-08ggml : introduce bfloat16 support (#6412)Justine Tunney
2024-04-26quantize: add imatrix and dataset metadata in GGUF (#6658)Pierrick Hymbert
2024-04-25quantize : add '--keep-split' to quantize model into shards (#6688)jiez
2024-04-03ggml : mul_mat_id use the same tensor for all the experts (#6387)slaren
2024-03-26IQ1_M: 1.75 bpw quantization (#6302)Kawrakow
2024-03-26quantize : be able to override metadata by key (#6321)Kawrakow
2024-03-22quantize: options for output and token embedding tensors qtype (#6239)Kawrakow
2024-02-27IQ4_XS: a 4.25 bpw quantization (#5747)Kawrakow
2024-02-26Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range...Kawrakow
2024-02-24IQ3_S: a much better alternative to Q3_K (#5676)Kawrakow