index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
tests
Age
Commit message (
Expand
)
Author
2024-05-15
ggml : add `ggml_upscale_ext` (ggml/814)
John Balis
2024-05-14
metal : support FA without mask + add asserts (#7278)
Georgi Gerganov
2024-05-14
Add left recursion check: quit early instead of going into an infinite loop (...
Haggai Nuchi
2024-05-12
CUDA: add FP32 FlashAttention vector kernel (#7188)
Johannes Gäßler
2024-05-11
llama : lookup word in vocab before doing BPE merges (#7193)
Haoxiang Fei
2024-05-11
ggml : full ALiBi support (#7192)
Georgi Gerganov
2024-05-09
llama3 custom regex split (#6965)
jaime-m-p
2024-05-09
CUDA: generalize FP16 fattn vec kernel (#7061)
Johannes Gäßler
2024-05-08
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143)
Johannes Gäßler
2024-05-08
llama : add BPE pre-tokenization for Qwen2 (#7114)
Ren Xuancheng
2024-05-08
ggml : introduce bfloat16 support (#6412)
Justine Tunney
2024-05-05
command-r : add BPE pre-tokenization (#7063)
DAN™
2024-05-05
py : logging and flake8 suppression refactoring (#7081)
Brian
2024-05-04
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
Georgi Gerganov
2024-05-03
convert.py : add python logging instead of print() (#6511)
Brian
2024-04-30
ggml : add Flash Attention (#5021)
Georgi Gerganov
2024-04-29
Extending grammar integration tests (#6644)
Clint Herron
2024-04-29
llama : fix BPE pre-tokenization (#6920)
Georgi Gerganov
2024-04-24
llama : add phi 3 chat template (#6857)
Tristan Druyen
2024-04-21
llama : add llama-3 chat template (#6751)
Wouter
2024-04-18
ggml : group all experts in a single ggml_mul_mat_id (#6505)
slaren
2024-04-16
llama : add qwen2moe (#6074)
Shijie
2024-04-15
`main`: add --json-schema / -j flag (#6659)
Olivier Chafik
2024-04-14
Add Command R chat template (#6650)
Chao Jiang
2024-04-12
JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings,...
Olivier Chafik
2024-04-12
metal : unify mul_mv_id kernels (#6556)
slaren
2024-04-11
grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses...
Olivier Chafik
2024-04-06
Tests: Added integration tests for GBNF parser (#6472)
Clint Herron
2024-04-03
Add OpenChat, Alpaca, Vicuna chat templates (#6397)
kaizau
2024-04-03
ggml : mul_mat_id use the same tensor for all the experts (#6387)
slaren
2024-03-26
IQ1_M: 1.75 bpw quantization (#6302)
Kawrakow
2024-03-25
tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)
Kawrakow
2024-03-22
tests : conditional python & node json schema tests (#6207)
Olivier Chafik
2024-03-22
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
Olivier Chafik
2024-03-22
metal : pad n_ctx by 32 (#6177)
Georgi Gerganov
2024-03-21
tests : disable system() calls (#6198)
Georgi Gerganov
2024-03-21
json-schema-to-grammar improvements (+ added to server) (#5978)
Olivier Chafik
2024-03-15
llama : add Orion chat template (#6066)
Xuan Son Nguyen
2024-03-13
test-backend-ops : skip CPU backend by default (#6028)
slaren
2024-03-11
llama : refactor unicode stuff (#5992)
Georgi Gerganov
2024-03-09
ggml : remove old quantization functions (#5942)
Georgi Gerganov
2024-03-09
tests : gitignore ggml-common.h
Georgi Gerganov
2024-03-04
add some new ops, fix some operators and add batch operations to certain oper...
leejet
2024-02-27
IQ4_XS: a 4.25 bpw quantization (#5747)
Kawrakow
2024-02-26
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range...
Kawrakow
2024-02-25
code : normalize enum names (#5697)
Georgi Gerganov
2024-02-24
IQ3_S: a much better alternative to Q3_K (#5676)
Kawrakow
2024-02-22
Add Gemma chat template (#5665)
Xuan Son Nguyen
2024-02-22
server : fallback to chatml, add AlphaMonarch chat template (#5628)
Xuan Son Nguyen
2024-02-21
IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)
Kawrakow
[next]