index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-06-22
iqk_mul_mat: delete unused stuff
Iwan Kawrakow
2024-06-22
iqk_mul_mat: add q8_0
Iwan Kawrakow
2024-06-22
iqk_mul_mat: fp16 tweaks
Iwan Kawrakow
2024-06-22
iqk_mul_mat: fp16 implementation cleanup
Iwan Kawrakow
2024-06-22
iqk_mul_mat: fp16 implementation for AVX2
Iwan Kawrakow
2024-06-22
iqk_mul_mat: multi-thread quantization also for MoE models
Iwan Kawrakow
2024-06-22
iqk_mul_mat: make it independent of sgemm
Iwan Kawrakow
2024-06-22
iqk_mul_mat: minor improvements
Iwan Kawrakow
2024-06-22
iqk_mul_mat: no more templates in the IQ dequantizers
Iwan Kawrakow
2024-06-22
iqk_mul_mat: remove template on one of the prepare() functions
Iwan Kawrakow
2024-06-22
iqk_mul_mat: experimenting with zen4
Iwan Kawrakow
2024-06-22
iqk_mul_mat: experimenting with zen4 (iq2_xxs)
Iwan Kawrakow
2024-06-22
iqk_mul_mat: experimenting with zen4 (iq2_xs)
Iwan Kawrakow
2024-06-22
iqk_mul_mat: experimenting with zen4 (iq3_s and iq2_m)
Iwan Kawrakow
2024-06-22
iqk_mul_mat: small improvement for iq3_s
Iwan Kawrakow
2024-06-22
iqk_mul_mat: better AVX2 implementation for iq2_xxs
Iwan Kawrakow
2024-06-22
iqk_mul_mat: better AVX2 implementation for iq2_xxs
Iwan Kawrakow
2024-06-22
iqk_mul_mat: AVX2 implementation for iq2_xxs
Iwan Kawrakow
2024-06-22
iqk_mul_mat: AVX2 implementation for iq2_xs
Iwan Kawrakow
2024-06-22
iqk_mul_mat: AVX2 implementation for iq2_s
Iwan Kawrakow
2024-06-22
Separate templates for TG and PP for i-quants on AVX2
Iwan Kawrakow
2024-06-22
iqk_mul_mat: AVX2 implementation for iq3_xxs
Iwan Kawrakow
2024-06-22
iqk_mul_mat: AVX2 implementation for iq3_s
Iwan Kawrakow
2024-06-22
Cleanup - Arm i-quants should be good now
Iwan Kawrakow
2024-06-22
iqk_mul_mat: Arm implementation for iq3_s (llama.cpp version)
Iwan Kawrakow
2024-06-22
Simplify
Iwan Kawrakow
2024-06-22
iqk_mul_mat: Arm implementation for iq3_xxs (llama.cpp version)
Iwan Kawrakow
2024-06-22
iqk_mul_mat: Arm implementation for iq2_xs (llama.cpp version)
Iwan Kawrakow
2024-06-22
iqk_mul_mat: Arm implementation for iq2_s (llama.cpp version)
Iwan Kawrakow
2024-06-22
Add Q8_0
Iwan Kawrakow
2024-06-22
Cosmetics
Iwan Kawrakow
2024-06-22
iqk_mul_mat: Arm implementation for iq2_xxs (llama.cpp version)
Iwan Kawrakow
2024-06-22
iqk_mul_mat: faster q3_K TG
Iwan Kawrakow
2024-06-22
iqk_mul_mat for llama.cpp
Iwan Kawrakow
2024-06-21
JSON Schema to GBNF integration tests (#7790)
Clint Herron
2024-06-21
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022)
k.h.lai
2024-06-21
ggml : AVX IQ quants (#7845)
Eve
2024-06-21
llama : optimize long word tokenization with WPM (#8034)
Georgi Gerganov
2024-06-21
llama : allow pooled embeddings on any model (#7477)
Douglas Hanley
2024-06-21
swiftui : enable stream updating (#7754)
Shuichi Tsutsumi
2024-06-20
requirements : Bump torch and numpy for python3.12 (#8041)
Hamdoud Hakem
2024-06-20
convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040)
Hamdoud Hakem
2024-06-20
common: fix warning (#8036)
Johannes Gäßler
2024-06-20
[SYCL] Fix windows build and inference (#8003)
luoyu-intel
2024-06-20
CUDA: stream-k decomposition for MMQ (#8018)
Johannes Gäßler
2024-06-20
metal : fix `ggml_metal_supports_op` for BF16 (#8021)
Michael de Gans
2024-06-20
server : fix smart slot selection (#8020)
sasha0552
2024-06-19
un-ignore `build-info.cmake` and `build-info.sh` (#7996)
Michael de Gans
2024-06-19
ggml : synchronize threads using barriers (#7993)
slaren
2024-06-19
codecov : remove (#8004)
Georgi Gerganov
[next]