index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2024-03-15
llava : change API to pure C style for Rust FFI bindgen (#6079)
Ting Lou
2024-03-15
cuda : disable unused cudaLaunchHostFunc code (#6078)
slaren
2024-03-15
fix set main gpu error (#6073)
Neo Zhang Jianyu
2024-03-15
make : ggml-metal.o depends on ggml.h
Georgi Gerganov
2024-03-15
[SYCL] Fix non-intel device selection (#6042)
AidanBeltonS
2024-03-15
gguf : add support for I64 and F64 arrays (#6062)
Ondřej Čertík
2024-03-15
llama : add Orion chat template (#6066)
Xuan Son Nguyen
2024-03-15
llama-bench : use random tokens to improve accuracy with mixtral (#6069)
slaren
2024-03-14
llama : fix integer overflow during quantization (#6063)
Georgi Gerganov
2024-03-14
gguf : fix resource leaks (#6061)
Steve Grubb
2024-03-14
gguf-py : bump version to 0.8.0 (#6060)
Ondřej Čertík
2024-03-14
llama : support models without vocabulary (#5798)
Michael Podvitskiy
2024-03-14
embedding : add EOS token if not present (#899)
Georgi Gerganov
2024-03-14
gguf-py : fix dtype check (#6045)
Georgi Gerganov
2024-03-14
readme : improve readme for Llava-1.6 example (#6044)
Jian Liao
2024-03-14
server: disable debug release type sanitizer, simplify trigger (#6047)
Pierrick Hymbert
2024-03-14
llama : fix typo
Georgi Gerganov
2024-03-14
llama : optimize defrag moves + fix fragmentation calculation (#6037)
Michael Podvitskiy
2024-03-14
gguf-py : add support for I8, I16 and I32 (#6045)
Ondřej Čertík
2024-03-14
ggml : designate enum vals for integer types (#6050)
Georgi Gerganov
2024-03-14
embedding : print all resulting embeddings (#899)
Georgi Gerganov
2024-03-14
metal : build metallib + fix embed path (#6015)
Georgi Gerganov
2024-03-14
embedding : print cosine similarity (#899)
Georgi Gerganov
2024-03-13
readme : update details about running llama in Termux on Android (#6039)
Linwei Wang
2024-03-13
readme : update API changes and hot topics
Georgi Gerganov
2024-03-13
grammar : handle missing "root" node (#6004)
Clint Herron
2024-03-13
llama : add pipeline parallelism support (#6017)
slaren
2024-03-13
test-backend-ops : skip CPU backend by default (#6028)
slaren
2024-03-13
Update get version (#6025)
AidanBeltonS
2024-03-13
Server: Use multi-task for embeddings endpoint (#6001)
Xuan Son Nguyen
2024-03-12
ci : remove tidy-review (#6021)
slaren
2024-03-12
ggml : reuse quantum structs across backends (#5943)
Georgi Gerganov
2024-03-12
ggml : fix UB in IQ2_S and IQ3_S (#6012)
Georgi Gerganov
2024-03-12
sycl : update IQ1_S kernels (WIP - not working!) (#5995)
Georgi Gerganov
2024-03-11
grammar : fix unnecessarily retained pointer to rules (#6003)
gliptic
2024-03-11
1.5 bit: we can do even better (#5999)
Kawrakow
2024-03-11
llama : more consistent names of count variables (#5994)
Georgi Gerganov
2024-03-11
llama : refactor unicode stuff (#5992)
Georgi Gerganov
2024-03-11
Update server docker image URLs (#5997)
Jakub N
2024-03-11
Server: format error to json (#5961)
Xuan Son Nguyen
2024-03-11
ggml, ci : Windows ARM runner and build fixes (#5979)
Michael Podvitskiy
2024-03-11
server : maintain chat completion id for streaming responses (#5988)
Minsoo Cheong
2024-03-11
cmake : fix subdir for `LLAMA_METAL_EMBED_LIBRARY` (#5985)
Gilad S
2024-03-11
llama : fix F16/F32 downcast + improve names (#5980)
Georgi Gerganov
2024-03-11
Better 1.5 bit quantization (#5971)
Kawrakow
2024-03-11
[SYCL] Add q3_s and q1_s (#5886)
Abhilash Majumder
2024-03-11
[SYCL] Add support for SYCL Nvidia target (#5738)
AidanBeltonS
2024-03-10
metal : move mm_id indices to shared mem (#5982)
Georgi Gerganov
2024-03-10
android : fix utf8 decoding error (#5935)
Dean
2024-03-10
readme : update hot topics
Georgi Gerganov
[next]