summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2023-08-23minor : fix trailing whitespaceGeorgi Gerganov
2023-08-23examples : restore the functionality to import llama2.c models (#2685)Olivier Chafik
2023-08-23fix convert-lora-to-ggml.py (#2738)slaren
2023-08-23main : insert bos if no tokens (#2727)klosax
2023-08-23gitignore : fix for windows (#2729)akawrykow
2023-08-23chmod : make scripts executable (#2675)Cebtenzzre
2023-08-23devops : RPM Specs (#2723)JohnnyB
2023-08-23Fix values shown in the quantize tool help (#2735)Kawrakow
2023-08-23Strided perplexity (#2714)Kawrakow
2023-08-23Fix ggml to gguf conversion on Windows (#2733)IgnacioFDM
2023-08-23server : allow json array in prompt or content for direct token input (#2306)Xiao-Yong Jin
2023-08-22docs : add grammar docs (#2701)Evan Jones
2023-08-22Improve handling of special tokens in GGML to GGUF converter (#2725)Kerfuffle
2023-08-23llama : fix whitespace escaping in tokenizer (#2724)goerch
2023-08-22CUDA: use mul_mat_q kernels by default (#2683)Johannes Gäßler
2023-08-22convert.py : clarifying error message (#2718)Alex Petenchea
2023-08-22Fix CUDA softmax by subtracting max value before exp (#2665)Jiahao Li
2023-08-22gguf : add ftype meta info to the model (#2710)Georgi Gerganov
2023-08-22Quantization imrovements for k_quants (#2707)Kawrakow
2023-08-22embedding : evaluate prompt in batches (#2713)slaren
2023-08-22ggml-cuda : use graph allocator (#2684)slaren
2023-08-22ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709)Georgi Gerganov
2023-08-22llama-bench : minor fixes (#2695)slaren
2023-08-22ggml : support CUDA's half type for aarch64(#1455) (#2670)Kylin
2023-08-22metal : add missing barriers for mul-mat (#2699)Shouzheng Liu
2023-08-22server : fallback to default if client param is null (#2688)Jhen-Jie Hong
2023-08-21Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698)Kerfuffle
2023-08-21py : remove obsolete scriptGeorgi Gerganov
2023-08-21gguf : new file format with flexible meta data (beta) (#2398)Georgi Gerganov
2023-08-21metal : fix synchronization in new matrix multiplication kernel (#2686)Shouzheng Liu
2023-08-21HellaSwag: split token evaluation into batches if needed (#2681)Kawrakow
2023-08-20ggml : move all type info to ggml_type_traits (#2663)slaren
2023-08-20More efficient Hellaswag implementation (#2677)Kawrakow
2023-08-19server : better default prompt (#2646)Georgi Gerganov
2023-08-19server : update xxd usage for older versions compatibility (#2649)Jhen-Jie Hong
2023-08-18Add link to clojure bindings to Readme. (#2659)Adrian
2023-08-18readme : incoming BREAKING CHANGEGeorgi Gerganov
2023-08-18llama : add benchmark example (#2626)slaren
2023-08-18readme : add link to Rust bindings (#2656)mdrokz
2023-08-18perplexity : more meaningful ETA number - 2 decimal pointsGeorgi Gerganov
2023-08-17Fix unicode in grammars (fixes #2501) (#2553)Evan Jones
2023-08-18server : support for saving templates in browser LocalStorage (#2486)staviq
2023-08-17README: fix LLAMA_CUDA_MMV_Y documentation (#2647)Johannes Gäßler
2023-08-17[Zig] Fixing Zig build and improvements (#2554)Henri Vasserman
2023-08-17Add --cfg-negative-prompt-file option for examples (#2591)Kerfuffle
2023-08-17llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)Georgi Gerganov
2023-08-17tests : adds simple llama grammar tests (#2618)drbh
2023-08-17ggml-alloc : fix discrepency between measure&eval (#2639)Shouzheng Liu
2023-08-16cmake : install ggml-meta.metal if LLAMA_METAL (#2449)Kolen Cheung
2023-08-16metal : print error of load pipeline state (#2564)Jhen-Jie Hong