summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-05-23Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX bas...fairydreaming
2024-05-23llama : rename n_ctx -> cache.size, less confusing (#0)Georgi Gerganov
2024-05-23labeler.yml: add embedding label detector [no ci] (#7482)Brian
2024-05-23ggml : remove ggml_flash_attn and ggml_flash_ff (#7463)Georgi Gerganov
2024-05-23ggml : drop support for QK_K=64 (#7473)Georgi Gerganov
2024-05-23Update vulkan rope implementation to support frequency factors (#7475)0cc4m
2024-05-23main : minor (#7462)Georgi Gerganov
2024-05-23CUDA: fix FA out-of-bounds reads (#7479)Johannes Gäßler
2024-05-23SimpleChat: a simple and dumb web front end for testing /chat/completions and...HanishKVC
2024-05-22build : remove zig (#7471)Georgi Gerganov
2024-05-22common : normalize naming style (#7462)Georgi Gerganov
2024-05-22CUDA: fix FA out-of-bounds writes (#7465)Johannes Gäßler
2024-05-22phi3 : duplicate rope factors in each layer (#7447)slaren
2024-05-22vulkan: add workaround for iterator boundary check to fix clang-cl debug buil...k.h.lai
2024-05-22llama : add missing model type names (#7445)Justine Tunney
2024-05-22cuda : fix compile warning (#7454)Georgi Gerganov
2024-05-22CUDA: remove incorrect precision check (#7454)Johannes Gäßler
2024-05-22cuda : fix rope + add tests (#7452)Georgi Gerganov
2024-05-21llama : add phi3 128K model support (#7225)liuwei-git
2024-05-21metal : handle F16 inf values, fix FA partial offload (#7434)Georgi Gerganov
2024-05-21`grammars`: fix resampling logic regression (#7424)Olivier Chafik
2024-05-21CUDA: fix unused warning in mmq.cu (#7442)Johannes Gäßler
2024-05-21tests : test-tokenizer-0.sh print more info (#7402)Georgi Gerganov
2024-05-21examples: cache hf model when --model not provided (#7353)Amir
2024-05-21CUDA: deduplicate mmq code (#7397)Johannes Gäßler
2024-05-21Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)jaime-m-p
2024-05-20Tokenizer SPM fixes for phi-3 and llama-spm (#7375)jaime-m-p
2024-05-21llama : remove Persimmon (#7408)Georgi Gerganov
2024-05-20perplexity: update README FP16 results [no ci] (#7413)Johannes Gäßler
2024-05-20rpc : track allocated buffers (#7411)Radoslav Gerganov
2024-05-20server : fix temperature + disable some tests (#7409)Georgi Gerganov
2024-05-20[SYCL] Update SYCL upscale operation (#7321)AidanBeltonS
2024-05-20Update README.md (#7410)Bingan
2024-05-20ggml-opencl, llama: using reserve() if count already known (#7272)Herman Semenov
2024-05-20ggml : add loongarch lsx and lasx support (#6454)junchao-loongson
2024-05-20server : tuning tests (#7388)Georgi Gerganov
2024-05-20server : return error on too large embedding input (#7389)Georgi Gerganov
2024-05-20tests : fix --keep_split -> --keep-split (#7374)Georgi Gerganov
2024-05-20Add provisions for windows support for BF16 code including CMake provision fo...Srihari-mcw
2024-05-20llama : remove MPI backend (#7395)slaren
2024-05-19quantize : fix --keep-split check (#7374)Fred Douglas
2024-05-19Vulkan Embedding Fix (#7360)0cc4m
2024-05-19ggml : fix another case of quants nans (#7387)slaren
2024-05-19ggml: implement quantized KV cache for FA (#7372)Johannes Gäßler
2024-05-19server: add test for token probs (#7347)Johannes Gäßler
2024-05-19server: fix seed being reported back (#7382)Johannes Gäßler
2024-05-19Add StableLM2 pre-tokenizer (#7349)Anas Ahouzi
2024-05-19cuda : clear error after buffer allocation failure (#7376)slaren
2024-05-19labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)Brian
2024-05-19cmake : update android comments (#7341)Georgi Gerganov