summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-06-14llama : more checks before assuming FIM tokens (#7644)Sigbjørn Skjæret
2024-06-14convert : add Poro-34B-chat tokenizer support (#7713)Elaine
2024-06-13rpc : fix ggml_backend_rpc_supports_buft() (#7918)Radoslav Gerganov
2024-06-13readme : Remove outdated instructions from README.md (#7914) [no ci]Galunid
2024-06-13move BLAS to a separate backend (#6210)slaren
2024-06-13`build`: rename main → llama-cli, server → llama-server, llava-cli → ll...Olivier Chafik
2024-06-12CUDA: fix broken oob check for FA vec f32 kernel (#7904)Johannes Gäßler
2024-06-12tests : add non-cont unary tests (#7857)Georgi Gerganov
2024-06-12ggml : improve ggml_is_contiguous logic (#7856)Georgi Gerganov
2024-06-12server : restore numeric prompts (#7883)Georgi Gerganov
2024-06-12update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894)Meng, Hengyu
2024-06-12Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci]Patrice Ferlet
2024-06-11vulkan: select only one device for single gpu with multiple drivers (#7582)k.h.lai
2024-06-11Update Vulkan RoPE implementation (#7818)0cc4m
2024-06-12fix broken link in pr template (#7880) [no ci]Deven Mistry
2024-06-11github: move PR template to .github/ root (#7868)Brian
2024-06-11llama-bench: more compact markdown tables (#7879)Johannes Gäßler
2024-06-11tests : check the Python version (#7872)Georgi Gerganov
2024-06-11CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)Johannes Gäßler
2024-06-11fix CUDA CI by using a windows-2019 image (#7861)slaren
2024-06-11json: refine constraint for whitespace to avoid runaways yet allow pretty pri...Olivier Chafik
2024-06-11`json`: document schema conversion in GBNF readme, align manual grammar examp...Olivier Chafik
2024-06-10cmake : fix CMake requirement for CUDA (#7821)Jared Van Bortel
2024-06-10ci : try win-2019 on server windows test (#7854)slaren
2024-06-10examples : remove --instruct remnants (#7846)Georgi Gerganov
2024-06-10server : improve "prompt" handling (#7847)Georgi Gerganov
2024-06-10CUDA: use tensor cores for MMQ (#7676)Johannes Gäßler
2024-06-10use the correct SYCL context for host USM allocations (#7777)Ben Ashbaugh
2024-06-09flake.lock: Update (#7838)Georgi Gerganov
2024-06-09imatrix : handle partial entries (#7833)Georgi Gerganov
2024-06-10docs: Added initial PR template with directions for doc only changes and squa...Nicolás Pérez
2024-06-09server: do not remove whitespace at the start of a completion chunk (#7830)mgroeber9110
2024-06-09CUDA: revise q8_1 data layout for mul_mat_q (#7824)Johannes Gäßler
2024-06-09convert-hf : set the model name based on cli arg, if present (#7693)sasha0552
2024-06-09convert-hf : match model part name prefix and suffix (#7687)compilade
2024-06-09gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)compilade
2024-06-09Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)slaren
2024-06-08url: save -mu downloads to new cache location (#7826)Olivier Chafik
2024-06-08server : smart slot selection using Longest Common Prefix (#7728)sasha0552
2024-06-07vulkan : reuse parent extra for views (#7806)slaren
2024-06-07gguf-split : change binary multi-byte units to decimal (#7803)Christian Zhou-Zheng
2024-06-07cmake : fix BUILD_SHARED_LIBS=ON build (#7784)intelmatt
2024-06-07server: update cache_prompt documentation [no ci] (#7745)Johannes Gäßler
2024-06-07server : do not get prompt in infill mode (#7286)woodx
2024-06-07[SYCL] fix softmax r2r result wrong issue (#7811)pengxin99
2024-06-07check for nans in imatrix and quantize (#7807)slaren
2024-06-06server : fix --threads-http arg (#7801)Georgi Gerganov
2024-06-06imatrix : migrate to gpt_params (#7771)Georgi Gerganov
2024-06-06Added support for . (any character) token in grammar engine. (#6467)Clint Herron
2024-06-06README minor fixes (#7798) [no ci]Mattheus Chediak