index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-01-16
examples : fix and improv docs for the grammar generator (#4909)
Maximilian Winter
2024-01-16
ggml : introduce GGML_CALL function annotation (#4850)
Justine Tunney
2024-01-16
finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)
Daniel Bevenius
2024-01-16
speculative : threading options (#4959)
stduhpf
2024-01-15
pass cpu-architecture arguments only to host code (C;C++) (#4943)
ngc92
2024-01-15
llama : apply classifier-free guidance to logits directly (#4951)
David Friehs
2024-01-15
awq-py : fix typo in awq-py/README.md (#4947)
Victor Z. Peng
2024-01-15
cuda : fix dequantize kernel names (#4938)
Georgi Gerganov
2024-01-15
llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)
Kawrakow
2024-01-15
CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)
Kawrakow
2024-01-14
llama : fix missing quotes (#4937)
David Pflug
2024-01-14
Add ability to use importance matrix for all k-quants (#4930)
Kawrakow
2024-01-14
llama : check LLAMA_TRACE env for extra logging (#4929)
Georgi Gerganov
2024-01-14
scripts : sync-ggml-am.sh option to skip commits
Georgi Gerganov
2024-01-14
llama : use LLAMA_LOG_ macros for logging
Georgi Gerganov
2024-01-14
Fix ffn_down quantization mix for MoE models (#4927)
Kawrakow
2024-01-14
metal : correctly set SIMD support flags on iOS (#4923)
Alex Azarov
2024-01-14
llama : support WinXP build with MinGW 8.1.0 (#3419)
Karthik Kumar Viswanathan
2024-01-14
2-bit quantizations (#4897)
Kawrakow
2024-01-14
Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)
Kawrakow
2024-01-14
sync : ggml
Georgi Gerganov
2024-01-13
ggml: cache sin/cos for RoPE (#4908)
Johannes Gäßler
2024-01-13
metal : remove old API (#4919)
Georgi Gerganov
2024-01-13
server : fix prompt caching with system prompt (#4914)
Georgi Gerganov
2024-01-13
llama : fix detokenization of non-special added-tokens (#4916)
Georgi Gerganov
2024-01-13
metal : disable log for loaded kernels (#4794)
Georgi Gerganov
2024-01-13
llama : minimize size used for state save/load (#4820)
David Friehs
2024-01-13
workflows: unbreak nix-build-aarch64, and split it out (#4915)
Someone
2024-01-13
main : add parameter --no-display-prompt (#4541)
Yann Follet
2024-01-13
gguf : fix potential infinite for-loop (#4600)
texmex76
2024-01-13
metal : refactor kernel loading code (#4794)
Georgi Gerganov
2024-01-13
compare-llama-bench: tweak output format (#4910)
Johannes Gäßler
2024-01-13
server : fix deadlock that occurs in multi-prompt scenarios (#4905)
Ziad Ben Hadj-Alouane
2024-01-13
server : fix crash with multimodal models without BOS token (#4904)
makomk
2024-01-13
convert : update phi-2 to latest HF repo (#4903)
Georgi Gerganov
2024-01-12
sync : ggml
Georgi Gerganov
2024-01-12
ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758)
Georgi Gerganov
2024-01-12
backend_sched : fix assignments
slaren
2024-01-12
examples : add pydantic models to GBNF grammar generator (#4883)
Maximilian Winter
2024-01-12
CUDA: faster q8_0 -> f16 dequantization (#4895)
Johannes Gäßler
2024-01-12
llama : ggml-backend integration (#4766)
slaren
2024-01-12
llama : remove redundant assert for StableLM (#4901)
Georgi Gerganov
2024-01-12
export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894)
Daniel Bevenius
2024-01-12
llama.swiftui : update models layout (#4826)
Zay
2024-01-12
gitignore : imatrix
Georgi Gerganov
2024-01-12
CUDA: fix softmax compile for old CUDA versions (#4862)
Johannes Gäßler
2024-01-12
llama : fix typo "imp_embd" -> "inp_embd"
Georgi Gerganov
2024-01-12
common : streamline the formatting of help (#4890)
howlger
2024-01-12
py : fix lint (#4889)
Georgi Gerganov
2024-01-12
llama : fix llm_build_k_shift to use correct n_rot (#4889)
Georgi Gerganov
[next]