summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-19Resolve ErrorIncompatibleDriver with Vulkan on MacOS.Mathijs de Bruin
Refs: - https://chat.openai.com/share/7020ce72-65fc-45ec-b7be-9d9d798a5f3f - https://github.com/SaschaWillems/Vulkan/issues/954 - https://github.com/haasn/libplacebo/issues/128 - https://github.com/KhronosGroup/Vulkan-Samples/issues/476
2024-02-19Allow for Vulkan build with Accelerate.Mathijs de Bruin
Closes #5304
2024-02-19cuda : ignore peer access already enabled errors (#5597)slaren
* cuda : ignore peer access already enabled errors * fix hip
2024-02-19make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598)Jared Van Bortel
2024-02-19examples : support minItems/maxItems in JSON grammar converter (#5039)nopperl
* support minLength and maxLength in JSON schema grammar converter * Update examples/json-schema-to-grammar.py --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19llava : remove extra cont (#5587)Georgi Gerganov
2024-02-19llava : replace ggml_cpy with ggml_contslaren
2024-02-19sync : ggmlGeorgi Gerganov
ggml-ci
2024-02-19ggml-alloc : apply ggml/731Georgi Gerganov
2024-02-19metal : option to embed MSL source into compiled binary (whisper/1842)Didzis Gosko
* ggml : embed Metal library source (ggml-metal.metal) into binary enable by setting WHISPER_EMBED_METAL_LIBRARY * rename the build option * rename the preprocessor directive * generate Metal library embedding assembly on-fly during build process
2024-02-19ci : enable -Werror for CUDA builds (#5579)Georgi Gerganov
* cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci
2024-02-19make : fix CUDA build (#5580)Georgi Gerganov
2024-02-19readme : fix typo in README-sycl.md (#5353)valiray
2024-02-19cmake : remove obsolete sycl compile flags (#5581)Abhilash Majumder
* rm unwanted sycl compile options * fix bug * fix bug * format fix
2024-02-19minor : fix trailing whitespace (#5538)Georgi Gerganov
2024-02-19llava : avoid changing the original BakLLaVA model (#5577)Daniel Bevenius
This is a follup of Commit fc0c8d286a533363a9a663510b62af85ffad58b3 ("llava : update surgery script to not remove tensors") but this time the change is to the BakLLaVA specific part of the surgery script. I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works as expected using the instructions in README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-19baby-llama : allocate graphs in ggml_context (#5573)NawafAlansari
* Fixed the baby-llama issue (see issue #4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19llama : add llama_chat_apply_template() (#5538)Xuan Son Nguyen
* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
2024-02-19cuda, metal : fix nans in soft_max (#5574)slaren
* cuda : fix nans in soft_max * metal : fix nans in soft_max --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-19readme : update (#5572)Mirko185
Added 1.5-bit on README.md
2024-02-19ggml : android and old glibc NUMA incompatibility bugfixes (#5557)bmwl
* #ifdef out some code NUMA blocks for Android due to lack of support * added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper * Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc * harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways --------- Co-authored-by: root <root@nenya.lothlorien.ca>
2024-02-18build : pass all warning flags to nvcc via -Xcompiler (#5570)Jared Van Bortel
* build : pass all warning flags to nvcc via -Xcompiler * make : fix apparent mis-merge from #3952 * make : fix incorrect GF_CC_VER for CUDA host compiler
2024-02-18ggml : restore vec dot stride arg names (#5453)Georgi Gerganov
2024-02-18ci : fix wikitext url + compile warnings (#5569)Georgi Gerganov
ggml-ci
2024-02-18metal : fix unused warnings (#0)Georgi Gerganov
2024-02-18common, server : surface min_keep as its own parameter (#5567)Robey Holderith
* Feature - surface min_keep as its own parameter * Updated README with min_keep param
2024-02-18server : slots monitoring endpoint (#5550)Pierrick Hymbert
2024-02-18sampling : do not set min_keep to n_probs (#5564)Georgi Gerganov
2024-02-18cmake : fix GGML_USE_SYCL typo (#5555)Georgi Gerganov
2024-02-18server : enhanced health endpoint (#5548)Pierrick Hymbert
* server: enrich health endpoint with available slots, return 503 if not slots are available * server: document new status no slot available in the README.md
2024-02-18server : --n-predict option document and cap to max value (#5549)Pierrick Hymbert
* server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option
2024-02-18server : graceful server shutdown (#5244)Daniel Hiltgen
This updates the server queue to support graceful shutdown of the server on signals.
2024-02-18common : fix ub (#5530)Georgi Gerganov
2024-02-18ggml, common, examples, tests : fixed type arguments in printf (#5528)Herman Semenov
2024-02-18llava : update surgery script to not remove tensors (#5536)Daniel Bevenius
This commit updates the surgery script to not remove the tensors from the model file. For this to work the `--skip-unknown` flag is added as an argument to the convert.py script in README.md. The motivation for this change is that the surgery script currently removes the projector tensors from the model file. If the model was checked out from a repository, the model file will have been updated and have to be checked out again to reset this effect. If this can be avoided I think it would be preferable. I did not perform this change for BakLLaVA models as I am not sure how that part works.
2024-02-181.5 bit quantization (#5453)Kawrakow
* iq1_s: WIP basics * iq1_s: CUDA is working * iq1_s: scalar CPU dot product * iq1_s: WIP AVX2 dot product - something is not right * Fix tests * Fix shadow warnings * Fix after merge with latest master * iq1_s: AVX2 finally works * iq1_s: ARM_NEON dot product. Works, but not very fast * iq1_s: better grid * iq1_s: use IQ2_XXS for attn_output At a cost of 0.04 extra bpw this gives a big improvement in PPL. * iq1_s: Metal basics Dequantize works, but not dot product * iq1_s: Metal works, but quite slow As usual, Apple Silicon does not like the code I write. * iq1_s: Tests * iq1_s: slightly faster dot product --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-18flake.lock: Updategithub-actions[bot]
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/f8e2ebd66d097614d51a56a755450d4ae1632df1' (2024-02-07) → 'github:NixOS/nixpkgs/5863c27340ba4de8f83e7e3c023b9599c3cb3c80' (2024-02-16)
2024-02-17ggml : add ALiBi support for ggml_soft_max_ext (#5488)Georgi Gerganov
* ggml : avoid recomputing alibi slopes (CPU) * llama : reuse hparams.f_max_alibi_bias in all cases ggml-ci * ggml : support alibi bias in ggml_soft_max_ext (CPU + Metal) ggml-ci * ggml : handle all SRCs (do not break on first null) ggml-ci * tests : do not use slope for large soft_max accumulates too much error ggml-ci * ggml : alternative ALiBi without extra tensor We compute the slopes in the kernel ggml-ci * cuda : add ALiBi support in ggml_soft_max_ext ggml-ci * ggml : deprecate ggml_alibi * ggml : support multi-sequence ALiBi (Metal) ggml-ci * cuda : add multi-seq ALiBi + remote F16 soft_max ggml-ci * ggml : update deprecation message * ggml : fix pos ptr when no ALiBi ggml-ci * cuda : fix performance (pow -> powf) * cuda : precompute ALiBi constants * metal : pre-compute ALiBi slopes ggml-ci * llama : init kq_pos only if needed ggml-ci * test-backend-ops : add null pos test to soft_max test-backend-ops : replace soft_max tests ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-02-17ci : add an option to fail on compile warning (#3952)Ananta Bastola
* feat(ci): add an option to fail on compile warning * Update CMakeLists.txt * minor : fix compile warnings ggml-ci * ggml : fix unreachable code warnings ggml-ci * ci : disable fatal warnings for windows, ios and tvos * ggml : fix strncpy warning * ci : disable fatal warnings for MPI build * ci : add fatal warnings to ggml-ci ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-17gitignore : update for CLion IDE (#5544)clibdev
2024-02-16cmake : fix VULKAN and ROCm builds (#5525)Georgi Gerganov
* cmake : fix VULKAN and ROCm builds * cmake : fix (cont) * vulkan : fix compile warnings ggml-ci * cmake : fix ggml-ci * cmake : minor ggml-ci
2024-02-16scripts : add helpers script for bench comparing commits (#5521)Georgi Gerganov
* scripts : add helpers script for bench comparing commits * scripts : detect CUDA * set flags after checking the command line * fix make flags --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-02-16llava : removed excess free(NULL) operation (#5531)Herman Semenov
2024-02-16llama : minor fixed return int value (#5529)Herman Semenov
2024-02-16server : add "samplers" param to control the samplers order (#5494)Alexey Parfenov
2024-02-16server : fix system prompt cli (#5516)Rőczey Barnabás
2024-02-16ggml : add numa options (#5377)bmwl
* Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverted Makefile * Fixed include * Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables * removed trailing whitespace * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverting Makefile * Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet * Removing MIRROR_MODE code for this PR * Removing last bit of MIRROR_MODE code for this PR * Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static * Fixed lingering init_llama_backend() bool calls in tests and examples * Remote enum llama_numa_strategies * Revert bad merge with dynatemp flags * add missing enum ggml_numa_strategies declaration and revert sync problem with master * add missing enum ggml_numa_strategies declaration * fixed ggml_init_numa variable * Update ggml.h Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges * split numa init out from llama_backend_init and created llama_numa_init. Updated all code paths and samples * Fix up some boolean vs enum comparisons * Added #ifdefs for non-Linux OS that don't have cpu_set_t datatype * Update ggml.h Align enum values Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c Remove whitespace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c align paremeters Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/server.cpp remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/common.cpp Remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * unified ggml_numa_strategy enum and fixed text alignment in server.cpp example * Update ggml.c simplified return for platforms without NUMA support Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * removed redundant else from cli argument processing of --numa * whitespace --------- Co-authored-by: root <root@nenya.lothlorien.ca> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-02-16llava : fix clip-model-is-vision flag in README.md (#5509)Daniel Bevenius
* llava: fix clip-model-is-vision flag in README.md This commit fixes the flag `--clip_model_is_vision` in README.md which is does not match the actual flag: ```console $ python convert-image-encoder-to-gguf.py --help ... --clip-model-is-vision The clip model is a pure vision model (ShareGPT4V vision extract for example) ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llava: update link to vit config in README.md Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-16ci : fix BERT model download and convertGeorgi Gerganov
2024-02-15Use correct type of pooling for embedding models (#5500)Douglas Hanley
Use correct type of pooling for embedding models