index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-05-23
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX bas...
fairydreaming
2024-05-23
llama : rename n_ctx -> cache.size, less confusing (#0)
Georgi Gerganov
2024-05-23
labeler.yml: add embedding label detector [no ci] (#7482)
Brian
2024-05-23
ggml : remove ggml_flash_attn and ggml_flash_ff (#7463)
Georgi Gerganov
2024-05-23
ggml : drop support for QK_K=64 (#7473)
Georgi Gerganov
2024-05-23
Update vulkan rope implementation to support frequency factors (#7475)
0cc4m
2024-05-23
main : minor (#7462)
Georgi Gerganov
2024-05-23
CUDA: fix FA out-of-bounds reads (#7479)
Johannes Gäßler
2024-05-23
SimpleChat: a simple and dumb web front end for testing /chat/completions and...
HanishKVC
2024-05-22
build : remove zig (#7471)
Georgi Gerganov
2024-05-22
common : normalize naming style (#7462)
Georgi Gerganov
2024-05-22
CUDA: fix FA out-of-bounds writes (#7465)
Johannes Gäßler
2024-05-22
phi3 : duplicate rope factors in each layer (#7447)
slaren
2024-05-22
vulkan: add workaround for iterator boundary check to fix clang-cl debug buil...
k.h.lai
2024-05-22
llama : add missing model type names (#7445)
Justine Tunney
2024-05-22
cuda : fix compile warning (#7454)
Georgi Gerganov
2024-05-22
CUDA: remove incorrect precision check (#7454)
Johannes Gäßler
2024-05-22
cuda : fix rope + add tests (#7452)
Georgi Gerganov
2024-05-21
llama : add phi3 128K model support (#7225)
liuwei-git
2024-05-21
metal : handle F16 inf values, fix FA partial offload (#7434)
Georgi Gerganov
2024-05-21
`grammars`: fix resampling logic regression (#7424)
Olivier Chafik
2024-05-21
CUDA: fix unused warning in mmq.cu (#7442)
Johannes Gäßler
2024-05-21
tests : test-tokenizer-0.sh print more info (#7402)
Georgi Gerganov
2024-05-21
examples: cache hf model when --model not provided (#7353)
Amir
2024-05-21
CUDA: deduplicate mmq code (#7397)
Johannes Gäßler
2024-05-21
Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)
jaime-m-p
2024-05-20
Tokenizer SPM fixes for phi-3 and llama-spm (#7375)
jaime-m-p
2024-05-21
llama : remove Persimmon (#7408)
Georgi Gerganov
2024-05-20
perplexity: update README FP16 results [no ci] (#7413)
Johannes Gäßler
2024-05-20
rpc : track allocated buffers (#7411)
Radoslav Gerganov
2024-05-20
server : fix temperature + disable some tests (#7409)
Georgi Gerganov
2024-05-20
[SYCL] Update SYCL upscale operation (#7321)
AidanBeltonS
2024-05-20
Update README.md (#7410)
Bingan
2024-05-20
ggml-opencl, llama: using reserve() if count already known (#7272)
Herman Semenov
2024-05-20
ggml : add loongarch lsx and lasx support (#6454)
junchao-loongson
2024-05-20
server : tuning tests (#7388)
Georgi Gerganov
2024-05-20
server : return error on too large embedding input (#7389)
Georgi Gerganov
2024-05-20
tests : fix --keep_split -> --keep-split (#7374)
Georgi Gerganov
2024-05-20
Add provisions for windows support for BF16 code including CMake provision fo...
Srihari-mcw
2024-05-20
llama : remove MPI backend (#7395)
slaren
2024-05-19
quantize : fix --keep-split check (#7374)
Fred Douglas
2024-05-19
Vulkan Embedding Fix (#7360)
0cc4m
2024-05-19
ggml : fix another case of quants nans (#7387)
slaren
2024-05-19
ggml: implement quantized KV cache for FA (#7372)
Johannes Gäßler
2024-05-19
server: add test for token probs (#7347)
Johannes Gäßler
2024-05-19
server: fix seed being reported back (#7382)
Johannes Gäßler
2024-05-19
Add StableLM2 pre-tokenizer (#7349)
Anas Ahouzi
2024-05-19
cuda : clear error after buffer allocation failure (#7376)
slaren
2024-05-19
labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)
Brian
2024-05-19
cmake : update android comments (#7341)
Georgi Gerganov
[next]