summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-03-14ggml : designate enum vals for integer types (#6050)Georgi Gerganov
2024-03-14embedding : print all resulting embeddings (#899)Georgi Gerganov
2024-03-14metal : build metallib + fix embed path (#6015)Georgi Gerganov
2024-03-14embedding : print cosine similarity (#899)Georgi Gerganov
2024-03-13readme : update details about running llama in Termux on Android (#6039)Linwei Wang
2024-03-13readme : update API changes and hot topicsGeorgi Gerganov
2024-03-13grammar : handle missing "root" node (#6004)Clint Herron
2024-03-13llama : add pipeline parallelism support (#6017)slaren
2024-03-13test-backend-ops : skip CPU backend by default (#6028)slaren
2024-03-13Update get version (#6025)AidanBeltonS
2024-03-13Server: Use multi-task for embeddings endpoint (#6001)Xuan Son Nguyen
2024-03-12ci : remove tidy-review (#6021)slaren
2024-03-12ggml : reuse quantum structs across backends (#5943)Georgi Gerganov
2024-03-12ggml : fix UB in IQ2_S and IQ3_S (#6012)Georgi Gerganov
2024-03-12sycl : update IQ1_S kernels (WIP - not working!) (#5995)Georgi Gerganov
2024-03-11grammar : fix unnecessarily retained pointer to rules (#6003)gliptic
2024-03-111.5 bit: we can do even better (#5999)Kawrakow
2024-03-11llama : more consistent names of count variables (#5994)Georgi Gerganov
2024-03-11llama : refactor unicode stuff (#5992)Georgi Gerganov
2024-03-11Update server docker image URLs (#5997)Jakub N
2024-03-11Server: format error to json (#5961)Xuan Son Nguyen
2024-03-11ggml, ci : Windows ARM runner and build fixes (#5979)Michael Podvitskiy
2024-03-11server : maintain chat completion id for streaming responses (#5988)Minsoo Cheong
2024-03-11cmake : fix subdir for `LLAMA_METAL_EMBED_LIBRARY` (#5985)Gilad S
2024-03-11llama : fix F16/F32 downcast + improve names (#5980)Georgi Gerganov
2024-03-11Better 1.5 bit quantization (#5971)Kawrakow
2024-03-11[SYCL] Add q3_s and q1_s (#5886)Abhilash Majumder
2024-03-11[SYCL] Add support for SYCL Nvidia target (#5738)AidanBeltonS
2024-03-10metal : move mm_id indices to shared mem (#5982)Georgi Gerganov
2024-03-10android : fix utf8 decoding error (#5935)Dean
2024-03-10readme : update hot topicsGeorgi Gerganov
2024-03-10sync : ggmlGeorgi Gerganov
2024-03-10ggml : try fix 32-bit arm compat (whisper/1938)Georgi Gerganov
2024-03-10ggml : remove __constant__ specifier for CUDA tables (#5940)Georgi Gerganov
2024-03-10server: ci: windows build and tests (#5968)Pierrick Hymbert
2024-03-10llama : add support for GritLM (#5959)DAN™
2024-03-10grammar : verify parsed state (#5950)Clint Herron
2024-03-10nix: update flake.lock (#5969)Georgi Gerganov
2024-03-09server: benchmark: chat/completions scenario and other llm servers comparison...Pierrick Hymbert
2024-03-09server : print chat template infoGeorgi Gerganov
2024-03-09perplexity : support using multiple sequences to allow larger batch sizes (#5...slaren
2024-03-09readme : update hot topicsGeorgi Gerganov
2024-03-09ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951)Georgi Gerganov
2024-03-09server : fix metrics init (#5964)Georgi Gerganov
2024-03-09ggml : remove old quantization functions (#5942)Georgi Gerganov
2024-03-09server : clarify some items in the readme (#5957)Georgi Gerganov
2024-03-09server : normalize embeddings (#5956)SeungWon Jeong
2024-03-09tests : gitignore ggml-common.hGeorgi Gerganov
2024-03-09server : fix passing prompt as tokens (#5955)Alexey Parfenov
2024-03-09ggml : add ggml-common.h to deduplicate shared code (#5940)Georgi Gerganov