summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-05-15embedding : free the batch after execution (#7297)dm4
2024-05-15sync : ggmlGeorgi Gerganov
2024-05-15ggml : add `ggml_upscale_ext` (ggml/814)John Balis
2024-05-15server bench: fix bench not waiting for model load (#7284)Johannes Gäßler
2024-05-14script : sync ggml-rpcGeorgi Gerganov
2024-05-14metal : support FA without mask + add asserts (#7278)Georgi Gerganov
2024-05-14sync : ggmlGeorgi Gerganov
2024-05-14metal : tune soft_max number of threads (whisper/0)Georgi Gerganov
2024-05-14ggml : try fix ppc64 (whisper/0)Georgi Gerganov
2024-05-14ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/2128)Przemysław Pawełczyk
2024-05-14ggml : optimize for ppc64le using VSX intrinsics (ggml/784)Hong Bo PENG
2024-05-14server: free sampling contexts on exit (#7264)Steve Grubb
2024-05-14Revert "move ndk code to a new library (#6951)" (#7282)Brian
2024-05-14ggml : add RPC backend (#6829)Radoslav Gerganov
2024-05-14llama : disable pipeline parallelism with nkvo (#7265)slaren
2024-05-14move ndk code to a new library (#6951)Elton Kola
2024-05-14Add left recursion check: quit early instead of going into an infinite loop (...Haggai Nuchi
2024-05-14docs: Fix typo and update description for --embeddings flag (#7026)Ryuei
2024-05-13convert-hf : support direct Q8_0 conversion (#7234)compilade
2024-05-13llama : less KV padding when FA is off (#7257)Georgi Gerganov
2024-05-14llava-cli: fix base64 prompt (#7248)k.h.lai
2024-05-13perplexity: add BF16 vs. FP16 results (#7150)Johannes Gäßler
2024-05-13[SYCL] rm wait() (#7233)Neo Zhang
2024-05-13llama : rename jina tokenizers to v2 (#7249)Joan Fontanals
2024-05-13convert.py: Outfile default name change and additional metadata support (#4858)Brian
2024-05-13change default temperature of OAI compat API from 0 to 1 (#7226)Benjamin Findley
2024-05-13[SYCL] Add oneapi runtime dll files to win release package (#7241)Neo Zhang
2024-05-13[SYCL] update CI with oneapi 2024.1 (#7235)Neo Zhang
2024-05-12CUDA: add FP32 FlashAttention vector kernel (#7188)Johannes Gäßler
2024-05-12cmake : fix version cmp (#7227)Georgi Gerganov
2024-05-12remove convert-lora-to-ggml.py (#7204)slaren
2024-05-11metal : fix warnings (skipme) (#0)Georgi Gerganov
2024-05-11sync : ggmlGeorgi Gerganov
2024-05-11metal : fix indent (ggml/0)Georgi Gerganov
2024-05-11ggml : resolve merge (ggml/0)Georgi Gerganov
2024-05-12Scripting & documenting debugging one test without anything else in the loop....Josh Ramer
2024-05-11fix system prompt handling (#7153)Xuan Son Nguyen
2024-05-11convert-hf : support bfloat16 conversion (#7158)compilade
2024-05-11sync : ggmlGeorgi Gerganov
2024-05-11feat: implemented sigmoid function (ggml/806)Justina Cho
2024-05-11build: fix and ignore msvc warnings (ggml/805)Borislav Stanimirov
2024-05-11convert : skip unaccessible HF repos (#7210)CrispStrobe
2024-05-11server : free llama_batch on exit (#7212)Steve Grubb
2024-05-11llama : lookup word in vocab before doing BPE merges (#7193)Haoxiang Fei
2024-05-11server: fix reported top tokens for temperature 0 (#7203)Johannes Gäßler
2024-05-11llama : add Jina Embeddings architecture (#6826)Joan Fontanals
2024-05-11ggml : full ALiBi support (#7192)Georgi Gerganov
2024-05-10llama-bench : add pp+tg test type (#7199)slaren
2024-05-10metal : fix flash attention kernel requirements (#7169)Georgi Gerganov
2024-05-10convert : print "ignore_merges" fieldGeorgi Gerganov