summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-06-22iqk_mul_mat: delete unused stuffIwan Kawrakow
2024-06-22iqk_mul_mat: add q8_0Iwan Kawrakow
2024-06-22iqk_mul_mat: fp16 tweaksIwan Kawrakow
2024-06-22iqk_mul_mat: fp16 implementation cleanupIwan Kawrakow
2024-06-22iqk_mul_mat: fp16 implementation for AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: multi-thread quantization also for MoE modelsIwan Kawrakow
2024-06-22iqk_mul_mat: make it independent of sgemmIwan Kawrakow
2024-06-22iqk_mul_mat: minor improvementsIwan Kawrakow
2024-06-22iqk_mul_mat: no more templates in the IQ dequantizersIwan Kawrakow
2024-06-22iqk_mul_mat: remove template on one of the prepare() functionsIwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4Iwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4 (iq2_xxs)Iwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4 (iq2_xs)Iwan Kawrakow
2024-06-22iqk_mul_mat: experimenting with zen4 (iq3_s and iq2_m)Iwan Kawrakow
2024-06-22iqk_mul_mat: small improvement for iq3_sIwan Kawrakow
2024-06-22iqk_mul_mat: better AVX2 implementation for iq2_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: better AVX2 implementation for iq2_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq2_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq2_xsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq2_sIwan Kawrakow
2024-06-22Separate templates for TG and PP for i-quants on AVX2Iwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq3_xxsIwan Kawrakow
2024-06-22iqk_mul_mat: AVX2 implementation for iq3_sIwan Kawrakow
2024-06-22Cleanup - Arm i-quants should be good nowIwan Kawrakow
2024-06-22iqk_mul_mat: Arm implementation for iq3_s (llama.cpp version)Iwan Kawrakow
2024-06-22SimplifyIwan Kawrakow
2024-06-22iqk_mul_mat: Arm implementation for iq3_xxs (llama.cpp version)Iwan Kawrakow
2024-06-22iqk_mul_mat: Arm implementation for iq2_xs (llama.cpp version)Iwan Kawrakow
2024-06-22iqk_mul_mat: Arm implementation for iq2_s (llama.cpp version)Iwan Kawrakow
2024-06-22Add Q8_0Iwan Kawrakow
2024-06-22CosmeticsIwan Kawrakow
2024-06-22iqk_mul_mat: Arm implementation for iq2_xxs (llama.cpp version)Iwan Kawrakow
2024-06-22iqk_mul_mat: faster q3_K TGIwan Kawrakow
2024-06-22iqk_mul_mat for llama.cppIwan Kawrakow
2024-06-21JSON Schema to GBNF integration tests (#7790)Clint Herron
2024-06-21vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022)k.h.lai
2024-06-21ggml : AVX IQ quants (#7845)Eve
2024-06-21llama : optimize long word tokenization with WPM (#8034)Georgi Gerganov
2024-06-21llama : allow pooled embeddings on any model (#7477)Douglas Hanley
2024-06-21swiftui : enable stream updating (#7754)Shuichi Tsutsumi
2024-06-20requirements : Bump torch and numpy for python3.12 (#8041)Hamdoud Hakem
2024-06-20convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040)Hamdoud Hakem
2024-06-20common: fix warning (#8036)Johannes Gäßler
2024-06-20[SYCL] Fix windows build and inference (#8003)luoyu-intel
2024-06-20CUDA: stream-k decomposition for MMQ (#8018)Johannes Gäßler
2024-06-20metal : fix `ggml_metal_supports_op` for BF16 (#8021)Michael de Gans
2024-06-20server : fix smart slot selection (#8020)sasha0552
2024-06-19un-ignore `build-info.cmake` and `build-info.sh` (#7996)Michael de Gans
2024-06-19ggml : synchronize threads using barriers (#7993)slaren
2024-06-19codecov : remove (#8004)Georgi Gerganov