Age | Commit message (Collapse) | Author |
|
Original patch by @gabriellarson:
https://github.com/ggml-org/llama.cpp/pull/14654
Co-authored-by: anikifoss <anikifoss>
|
|
* add hunyuan moe
* Don't reshape Vcur
* Apply chat template fix from mainline PR14584
|
|
* Special handling of Seed Coder FIM tokens
* vocab: Add Seed Coder pretokenizer
* Formatting fix
* Update llama.h
|
|
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
* add dry sampler
* use vocab instead of model in dry_init function
* fix compile error for build test
---------
Co-authored-by: firecoperana <firecoperana>
|
|
* llama4: WIP
* llama4: this seems to be working
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
|
|
* Merge mainline
* Fix after merge
* Remove CI check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|