index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-03-26
quantize : be able to override metadata by key (#6321)
Kawrakow
2024-03-26
embedding : adjust `n_ubatch` value (#6296)
Minsoo Cheong
2024-03-26
server : add `n_discard` parameter (#6300)
Jan Boon
2024-03-25
nix: make `xcrun` visible in Nix sandbox for precompiling Metal shaders (#6118)
Joseph Stahl
2024-03-26
cuda : rename build flag to LLAMA_CUDA (#6299)
slaren
2024-03-25
nix: fix blas support (#6281)
Christian Kögler
2024-03-25
tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)
Kawrakow
2024-03-25
flake.lock: Update (#6266)
Georgi Gerganov
2024-03-25
cuda : fix LLAMA_CUDA_F16 build (#6298)
slaren
2024-03-25
cuda : refactor into multiple files (#6269)
slaren
2024-03-25
Server: clean up OAI params parsing function (#6284)
Xuan Son Nguyen
2024-03-25
[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)
Neo Zhang Jianyu
2024-03-25
examples : add "retrieval" (#6193)
Minsoo Cheong
2024-03-25
ggml : support AVX512VNNI (#6280)
Justine Tunney
2024-03-24
Fix heap corruption from wmode out-of-bound writes on windows (#6272)
Rick G
2024-03-24
imatrix : fix wname for mul_mat_id ops (#6271)
Georgi Gerganov
2024-03-24
Fixed lookup compilation issues on Windows (#6273)
Johannes Gäßler
2024-03-24
ci : close inactive issue, increase operations per run (#6270)
Pierrick Hymbert
2024-03-24
sampling : deduplicated code for probability distribution access (#6240)
Minsoo Cheong
2024-03-24
[SYCL] offload op (#6217)
Meng, Hengyu
2024-03-24
Support build win release for SYCL (#6241)
Neo Zhang Jianyu
2024-03-23
use _wfopen instead of fopen on Windows (#6248)
Jared Van Bortel
2024-03-23
gitignore : gguf-split
Georgi Gerganov
2024-03-23
common: llama_load_model_from_url split support (#6192)
Pierrick Hymbert
2024-03-23
server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (...
Pierrick Hymbert
2024-03-23
llama : add grok-1 support (#6204)
Julius Arkenberg
2024-03-23
split: add gguf-split in the make build target (#6262)
Pierrick Hymbert
2024-03-23
server: flush stdout after logging in both text and json layout (#6253)
Pierrick Hymbert
2024-03-23
lookup: complement data from context with general text statistics (#5479)
Johannes Gäßler
2024-03-22
common : default --hf-file to --model (#6234)
Georgi Gerganov
2024-03-22
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
fraxy-v
2024-03-22
quantize: options for output and token embedding tensors qtype (#6239)
Kawrakow
2024-03-22
llama_model_loader: support multiple split/shard GGUFs (#6187)
Pierrick Hymbert
2024-03-22
ci: apply concurrency limit for github workflows (#6243)
Minsoo Cheong
2024-03-22
common : add HF arg helpers (#6234)
Georgi Gerganov
2024-03-22
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
Nexesenex
2024-03-22
tests : conditional python & node json schema tests (#6207)
Olivier Chafik
2024-03-22
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
Olivier Chafik
2024-03-22
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)
slaren
2024-03-22
readme : add RecurseChat to the list of UIs (#6219)
Xiaoyi Chen
2024-03-22
server : fix n_keep always showing as 0 in response (#6211)
Jan Boon
2024-03-22
server : enable continuous batching by default (#6231)
Georgi Gerganov
2024-03-22
metal : proper assert for mat-mat memory alignment (#6225)
Georgi Gerganov
2024-03-22
ci : add CURL flag for the mac builds (#6214)
Vaibhav Srivastav
2024-03-22
metal : pad n_ctx by 32 (#6177)
Georgi Gerganov
2024-03-22
add blog link (#6222)
Neo Zhang Jianyu
2024-03-22
Fix params underscore convert to dash. (#6203)
DAN™
2024-03-21
server : update readme doc from `slot_id` to `id_slot` (#6213)
Jan Boon
2024-03-21
cuda : disable host register by default (#6206)
slaren
2024-03-21
Corrected typo to wrong file (#6199)
semidark
[next]