summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-04-16convert : fix autoawq gemma (#6704)Zheng.Deng
2024-04-16llama : make general.name optional (#6709)Georgi Gerganov
2024-04-16ggml : fix llamafile sgemm wdata offsets (#6710)Georgi Gerganov
2024-04-16ggml : add llamafile sgemm (#6414)Justine Tunney
2024-04-16llama : add StableLM2 12B (#6635)Ashish
2024-04-16llama : add qwen2moe (#6074)Shijie
2024-04-16gritlm : add --outdir option to hf.sh script (#6699)Daniel Bevenius
2024-04-16perplexity : require positive --ctx-size arg (#6695)Georgi Gerganov
2024-04-16gguf : add special tokens metadata for FIM/Infill (#6689)Daniel Bevenius
2024-04-15`main`: add --json-schema / -j flag (#6659)Olivier Chafik
2024-04-15llama : fix restoring the number of outputs from state files (#6687)compilade
2024-04-15server : revert "minor layout improvements" (#6684)Pierrick Hymbert
2024-04-15swift : linux support (#6590)Steven Prichard
2024-04-15fix mul_mat_id() for new input, make the ut pass (#6682)Neo Zhang Jianyu
2024-04-14llama : add missing kv clear in llama_beam_search (#6664)David Renshaw
2024-04-14Add Command R chat template (#6650)Chao Jiang
2024-04-14flake.lock: Update (#6669)Georgi Gerganov
2024-04-14Added support for GGML_OP_CLAMP in Metal (#6662)Dave
2024-04-14Fix --split-max-size (#6655)Sigbjørn Skjæret
2024-04-14[bug fix] convert github repository_owner to lowercase (#6673)Jaemin Son
2024-04-14convert : enable the `--use-temp-file` cli flag (#6645)James A Capozzoli
2024-04-14fix memcpy() crash, add missed cmd in guide, fix softmax (#6622)Neo Zhang Jianyu
2024-04-14CUDA: fix matrix multiplication logic for tests (#6667)Johannes Gäßler
2024-04-13model: support arch `DbrxForCausalLM` (#6515)Pierrick Hymbert
2024-04-12JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings,...Olivier Chafik
2024-04-12metal : unify mul_mv_id kernels (#6556)slaren
2024-04-12infill : add download instructions for model (#6626)Daniel Bevenius
2024-04-12server : coherent log output for KV cache full (#6637)Pierrick Hymbert
2024-04-12llama : add gguf_remove_key + remove split meta during quantize (#6591)jiez
2024-04-12chore: Fix markdown warnings (#6625)Rene Leonhardt
2024-04-12imatrix : remove invalid assert (#6632)Georgi Gerganov
2024-04-12Correct free memory and total memory. (#6630)MasterYi1024
2024-04-12eval-callback: use ggml_op_desc to pretty print unary operator name (#6631)Pierrick Hymbert
2024-04-12ci : disable Metal for macOS-latest-cmake-x64 (#6628)Georgi Gerganov
2024-04-11Optimization: eliminate addition of redundant stacks when advancing grammar. ...Clint Herron
2024-04-11As suggested by @slaren, disabling Metal for test to fix CI build on OSX from...Clint Herron
2024-04-11Refactor Error Handling for CUDA (#6575)Nikolas
2024-04-11grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses...Olivier Chafik
2024-04-11ci: download artifacts to release directory (#6612)Hugo Roussel
2024-04-11scripts : add --outdir option to hf.sh (#6600)Daniel Bevenius
2024-04-11eval-callback: Example how to use eval callback for debugging (#6576)Pierrick Hymbert
2024-04-10gguf : add option to not check tensor data (#6582)Daniel Bevenius
2024-04-10minor layout improvements (#6572)Ralph Soika
2024-04-10llama : add model types for mixtral (#6589)slaren
2024-04-10convert.py : add consolidated.safetensors for mixtral 8x22b (#6587)slaren
2024-04-10docs : how to add a model (#6565)Pierrick Hymbert
2024-04-10readme : fix ROCm link (#6579)Artem Zinnatullin
2024-04-10readme : update UI list (#6560)sjxx
2024-04-10readme: fix typo in amdgpu target name (#6573)Jiří Sejkora
2024-04-09BERT tokenizer fixes (#6498)Jared Van Bortel