summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)Author
2024-10-20Avoid rebuild of GGML graph for each token (#98)agray3
2024-10-19Bitnet: make the scale tensors optional (#97)Kawrakow
2024-10-19Quant strategies: attn_q Q4 & attn_v Q6 for Llama 3.1 Q5_K_S (#96)Nexes the Elder
2024-10-18CLI - Specify GGML_TYPE to quantize for the main tensors. (#91)Nexes the Elder
2024-10-16Adding IQ4_KSS: 4.0 bpw quants (#89)Kawrakow
2024-10-13IQ2_KS: 2.1875 bpw non-linear quantization (#85)Kawrakow
2024-10-11Minor: printf -> LLAMA_LOG_INFOIwan Kawrakow
2024-10-10Better model info (#84)Kawrakow
2024-10-09New SOTA quantization: 4.25 bpw IQ4_KS (#83)Kawrakow
2024-10-02Fused unary(x)*y (#70)Kawrakow
2024-10-02Adding Q6_0 (#77)Kawrakow
2024-09-29Allow bf16 kv-cache (#69)Kawrakow
2024-09-28Time to fix replace_all (#68)Kawrakow
2024-09-28CUDA non-contiguous RoPE (#66)Kawrakow
2024-09-28Adding SWIGLU unary op (#65)Kawrakow
2024-09-28Better sub-3-bit quantization mixes with a qkv tensor (#64)Kawrakow
2024-09-27Adding ability to have meta data per tensor row (#61)Kawrakow
2024-09-19MinorIwan Kawrakow
2024-09-14Quantization mixes tweaks (#53)Kawrakow
2024-09-09Adding IQ1_TN - 1.6875 bpw for TriLM ternary models (#44)Kawrakow
2024-09-08Adding fused rms_norm (#42)Kawrakow
2024-08-27Faster Gemma2 (#27)Kawrakow
2024-08-21softcap: minor improvement (#24)Kawrakow
2024-08-20Fused soft cap and SIMD-ified GeLU (#9)Kawrakow
2024-08-20iq4_k: use iq5_k also when n_gqa = 2 (#23)Kawrakow
2024-08-19iq2_k: slightly better bpw - accuracy compromise (#20)Kawrakow
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
2024-08-09iq6_k: WIP (quantize/dequantize)Iwan Kawrakow
2024-08-07Adding IQ2_TN for use with ternary models (#13)Kawrakow
2024-08-05q2_K: allow it to detect ternary nets and quantize accordinglyIwan Kawrakow
2024-08-01iq3_k: BasicsIwan Kawrakow
2024-08-01iq5_k: BasicsIwan Kawrakow
2024-08-01iq2_k: BasicsIwan Kawrakow
2024-07-28IQ4_K: SOTA 4-bit quantization (#6)Kawrakow
2024-07-27Merge mainline llama.cpp (#3)Kawrakow