index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-03-14
ggml : designate enum vals for integer types (#6050)
Georgi Gerganov
2024-03-14
embedding : print all resulting embeddings (#899)
Georgi Gerganov
2024-03-14
metal : build metallib + fix embed path (#6015)
Georgi Gerganov
2024-03-14
embedding : print cosine similarity (#899)
Georgi Gerganov
2024-03-13
readme : update details about running llama in Termux on Android (#6039)
Linwei Wang
2024-03-13
readme : update API changes and hot topics
Georgi Gerganov
2024-03-13
grammar : handle missing "root" node (#6004)
Clint Herron
2024-03-13
llama : add pipeline parallelism support (#6017)
slaren
2024-03-13
test-backend-ops : skip CPU backend by default (#6028)
slaren
2024-03-13
Update get version (#6025)
AidanBeltonS
2024-03-13
Server: Use multi-task for embeddings endpoint (#6001)
Xuan Son Nguyen
2024-03-12
ci : remove tidy-review (#6021)
slaren
2024-03-12
ggml : reuse quantum structs across backends (#5943)
Georgi Gerganov
2024-03-12
ggml : fix UB in IQ2_S and IQ3_S (#6012)
Georgi Gerganov
2024-03-12
sycl : update IQ1_S kernels (WIP - not working!) (#5995)
Georgi Gerganov
2024-03-11
grammar : fix unnecessarily retained pointer to rules (#6003)
gliptic
2024-03-11
1.5 bit: we can do even better (#5999)
Kawrakow
2024-03-11
llama : more consistent names of count variables (#5994)
Georgi Gerganov
2024-03-11
llama : refactor unicode stuff (#5992)
Georgi Gerganov
2024-03-11
Update server docker image URLs (#5997)
Jakub N
2024-03-11
Server: format error to json (#5961)
Xuan Son Nguyen
2024-03-11
ggml, ci : Windows ARM runner and build fixes (#5979)
Michael Podvitskiy
2024-03-11
server : maintain chat completion id for streaming responses (#5988)
Minsoo Cheong
2024-03-11
cmake : fix subdir for `LLAMA_METAL_EMBED_LIBRARY` (#5985)
Gilad S
2024-03-11
llama : fix F16/F32 downcast + improve names (#5980)
Georgi Gerganov
2024-03-11
Better 1.5 bit quantization (#5971)
Kawrakow
2024-03-11
[SYCL] Add q3_s and q1_s (#5886)
Abhilash Majumder
2024-03-11
[SYCL] Add support for SYCL Nvidia target (#5738)
AidanBeltonS
2024-03-10
metal : move mm_id indices to shared mem (#5982)
Georgi Gerganov
2024-03-10
android : fix utf8 decoding error (#5935)
Dean
2024-03-10
readme : update hot topics
Georgi Gerganov
2024-03-10
sync : ggml
Georgi Gerganov
2024-03-10
ggml : try fix 32-bit arm compat (whisper/1938)
Georgi Gerganov
2024-03-10
ggml : remove __constant__ specifier for CUDA tables (#5940)
Georgi Gerganov
2024-03-10
server: ci: windows build and tests (#5968)
Pierrick Hymbert
2024-03-10
llama : add support for GritLM (#5959)
DAN™
2024-03-10
grammar : verify parsed state (#5950)
Clint Herron
2024-03-10
nix: update flake.lock (#5969)
Georgi Gerganov
2024-03-09
server: benchmark: chat/completions scenario and other llm servers comparison...
Pierrick Hymbert
2024-03-09
server : print chat template info
Georgi Gerganov
2024-03-09
perplexity : support using multiple sequences to allow larger batch sizes (#5...
slaren
2024-03-09
readme : update hot topics
Georgi Gerganov
2024-03-09
ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951)
Georgi Gerganov
2024-03-09
server : fix metrics init (#5964)
Georgi Gerganov
2024-03-09
ggml : remove old quantization functions (#5942)
Georgi Gerganov
2024-03-09
server : clarify some items in the readme (#5957)
Georgi Gerganov
2024-03-09
server : normalize embeddings (#5956)
SeungWon Jeong
2024-03-09
tests : gitignore ggml-common.h
Georgi Gerganov
2024-03-09
server : fix passing prompt as tokens (#5955)
Alexey Parfenov
2024-03-09
ggml : add ggml-common.h to deduplicate shared code (#5940)
Georgi Gerganov
[next]