summaryrefslogtreecommitdiff
path: root/gguf-py/gguf/quants.py
AgeCommit message (Collapse)Author
2025-06-03 convert_hf_to_gguf.py : conversion from hf weights to Q6_0 (#483)Nexes the Elder
* Direct conversion from fp16 to Q6_0 * forgotten comma * More precise infos
2024-08-12Merge mainline - Aug 12 2024 (#17)Kawrakow
* Merge mainline * Fix after merge * Remove CI check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-07-27Merge mainline llama.cpp (#3)Kawrakow
* Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-05-25gguf-py : fix and simplify quantized shape round-trip (#7483)compilade
* gguf-py : fix and simplify quantized shape round-trip * gguf-py : remove unused import
2024-05-13convert-hf : support direct Q8_0 conversion (#7234)compilade
* convert-hf : support q8_0 conversion * convert-hf : add missing ftype This was messing with the checksums otherwise. * convert-hf : add missing ftype to Baichuan and Xverse I didn't notice these on my first pass.