summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-05ggml : make use of ggml-quants.h possible in C++ code (#5338)Kawrakow
* Make use of ggml-quants.h possible in C++ code * One cannot possibly be defining static_assert in a C++ compilation --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-05ggml : avoid duplicating function calls using MIN/MAX macros (#5325)Dr. Tom Murphy VII Ph.D
* Avoid duplicating function calls when using MIN/MAX macros. Since these copy "a" and "b" they ask the compiler to evaluate one of them twice. The compiler doesn't have a problem with removing the duplication in something like MAX(0, x + 2), but in some cases we're calling functions, and those calls just happen twice. By explicitly evaluating at the expression we get smaller and faster code without duplicate calls. See ggml_rope_yarn_corr_dims in Compiler Explorer: https://godbolt.org/z/Ee4KMrvKh Code behaves exactly the same. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-05iq3_xxs: quards for the no-imatrix situation (#5334)Kawrakow
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-05py : fix internlm2-hf convert to gguf (#5305)Guoteng
* py : fix internlm2-hf convert to gguf * ggml-ci
2024-02-05iq2_xxs: tune quantization (#5320)Kawrakow
We get slightly better PPL, and we cut quantization time in nearly half. The trick is to 1st quantize without forcing points onto the E8-lattice. We can then use a narrower search range around the block scale that we got that way. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-05server : allow to get default generation settings for completion (#5307)Alexey Parfenov
2024-02-05common : add dynamic temperature parameters to main example cli (#5295)l3utterfly
* added dynamic temp params in main * added help text
2024-02-05scripts : fix typos, cleanup (#5303)Georgi Gerganov
2024-02-05scripts : add non-interactive server-llm.sh (#5303)Нияз Гарифзянов
* Update server-llm.sh Add flag --non-interactive that allows run script without asking a permission * Update scripts/server-llm.sh --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-05readme : add CodeShell models to the supported models list (#5330)chiranko
2024-02-05[SYCL] Fix cpy with dims of 3 (#5289)AidanBeltonS
* Fix cpy with dims of 3 * rm asserts --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-02-04flake.lock: Updategithub-actions[bot]
Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/07f6395285469419cf9d078f59b5b49993198c00' (2024-01-11) → 'github:hercules-ci/flake-parts/b253292d9c0a5ead9bc98c4e9a26c6312e27d69f' (2024-02-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/b0d36bd0a420ecee3bc916c91886caca87c894e9?dir=lib' (2023-12-30) → 'github:NixOS/nixpkgs/97b17f32362e475016f942bbdfda4a4a72a8a652?dir=lib' (2024-01-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ae5c332cbb5827f6b1f02572496b141021de335f' (2024-01-25) → 'github:NixOS/nixpkgs/b8b232ae7b8b144397fdb12d20f592e5e7c1a64d' (2024-01-31)
2024-02-04Adding some imatrix tools (#5302)Kawrakow
* imatrix: adding --combine and --continue-from * imatrix: be able to start from a specific chunk --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-03cmake : use set() for LLAMA_WIN_VER (#5298)Welby Seely
option() is specifically for booleans. Fixes #5158
2024-02-03make: add nvcc info print (#5310)Johannes Gäßler
2024-02-03make: fix nvcc optimization flags for host code (#5309)Johannes Gäßler
2024-02-03add Vulkan support to Nix flakeMartin Schwaighofer
2024-02-03Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301)0cc4m
* Fix Vulkan on Intel ARC Optimize matmul for Intel ARC Add Vulkan dequant test * Add Vulkan debug and validate flags to Make and CMakeLists.txt * Enable asynchronous transfers in Vulkan backend * Fix flake8 * Disable Vulkan async backend functions for now * Also add Vulkan run tests command to Makefile and CMakeLists.txt
2024-02-03refactor : switch to emplace_back to avoid extra object (#5291)Michael Klimenko
2024-02-03YaRN : store rope scaling type as int32_t in memory (#5285)Jared Van Bortel
* YaRN : store rope scaling type as int32_t in memory * llama : store mapped names as const char *
2024-02-03readme : add tenere in the ui tools list (#5284)BADR
2024-02-03Fix im2col with 32fp (#5286)AidanBeltonS
2024-02-02perplexity : fix KL divergence calculations on Windows (#5273)kalomaze
2024-02-02scripts : parse wtype in server-llm.sh (#5167)Georgi Gerganov
* scripts : parse wtype in server-llm.sh * scripts : fix check for wfile
2024-02-02py : add check for '.attn.masked_bias' layers to GPT2model (#5281)Mirror Azure
2024-02-02Tidy ggml-sycl (#5261)AidanBeltonS
* Tidy some code in ggml-sycl * Remove blank space * Remove std::printf comments --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-02-02docker : add build for SYCL, Vulkan + update readme (#5228)Xuan Son Nguyen
* add vulkan dockerfile * intel dockerfile: compile sycl by default * fix vulkan dockerfile * add docs for vulkan * docs: sycl build in docker * docs: remove trailing spaces * docs: sycl: add docker section * docs: clarify install vulkan SDK outside docker * sycl: use intel/oneapi-basekit docker image * docs: correct TOC * docs: correct docker image for Intel oneMKL
2024-02-02[SYCL] get MAX_MEM_ALLOC from device property (#5270)Meng, Hengyu
* get max alloc size from device prop * fix macro typo
2024-02-02[SYCL] update guide of SYCL backend (#5254)Neo Zhang Jianyu
* update guide for make installation, memory, gguf model link, rm todo for windows build * add vs install requirement * update for gpu device check * update help of llama-bench * fix grammer issues
2024-02-02llama : fix memory leak in llama_batch_free (#5252)Ian Bull
The llama_batch_init allocates memory for a fixed number of tokens. However, the llama_batch_free only frees memory for the number of tokens that were added to the batch. This change-set uses a null terminated array for the batch seq_id, and frees all the elements until the nullptr is reached. This change-set also changes the name of the first parameter from `n_tokens` to `n_tokens_alloc` to more clearly indicate that this value is the number of tokens allocated to the batch, not the number of tokens in the batch.
2024-02-01add --no-mmap in llama-bench (#5257)Neo Zhang Jianyu
* add --no-mmap, show sycl backend * fix conflict * fix code format, change print for --no-mmap * ren no_mmap to mmap, show mmap when not default value in printer * update guide for mmap * mv position to reduce model reload
2024-02-01Vulkan Phi Fix for AMD Proprietary Drivers (#5260)0cc4m
* Replace tanh to avoid NaN in gelu shader on AMD proprietary driver * Fix another Vulkan CPY buffer size bug
2024-02-01cuda : fix LLAMA_CUDA_F16 (#5262)slaren
2024-02-01make : generate .a library for static linking (#5205)Ali Nehzat
2024-02-01llama : support InternLM2 (#5184)Guoteng
* support InternLM2 inference * add add_space_prefix KV pair
2024-01-31Fix broken Vulkan Cmake (properly) (#5230)Eve
* build vulkan as object * vulkan ci
2024-01-31llama : reorder build_orion() at correct place (#5118)Georgi Gerganov
2024-01-31llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)Georgi Gerganov
* llama : remove LLAMA_MAX_DEVICES from llama.h ggml-ci * Update llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * server : remove LLAMA_MAX_DEVICES ggml-ci * llama : remove LLAMA_SUPPORTS_GPU_OFFLOAD ggml-ci * train : remove LLAMA_SUPPORTS_GPU_OFFLOAD * readme : add deprecation notice * readme : change deprecation notice to "remove" and fix url * llama : remove gpu includes from llama.h ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-01-31metal : add im2col F32 dst support (#5132)Georgi Gerganov
2024-01-31llava : add MobileVLM support (#5132)JidongZhang-THU
* New Feature: 1. Sum_Rows: fix cuda kernel overflow fix block shape error when nrows too big 2. Im2Col: Support Batch in cuda Support f32 to f32 both in cpu && cuda 3. DepthWiseConv: Support by Im2Col && MulMat 4. Pool_2d: Supoort avg pooling in cuda 5. HardSigmoid: Imp in cuda 6. HardSwish: Imp in cuda * fix tabs instead of spaces * code clean * CUDA POOL2D * ADD POOL2D test case in test-backend-ops.cpp * code clean * fix pool2d_kernel nits * fix bug in pool2d kernel * fix avg pooling, count_include_pad nits * test-backend-ops : add more pool_2d tests * cuda : fix warnings and formatting * ggml : check types in release builds too in pool_2d * test-backend-ops : remove f16 pool_2d tests * cuda : more style fixes * Add assert in ggml_cuda_op_pool2d * pool2d float padding fallback * test-backend-ops : add dst_type to im2col --------- Co-authored-by: slaren <slarengh@gmail.com>
2024-01-31format license text, restore apache license by legal suggestion (#5233)Neo Zhang Jianyu
2024-01-31ggml : limit n_threads to the max n_tasks (#5238)slaren
2024-01-31Vulkan Fixes (#5223)0cc4m
* Fix Vulkan F16 models * Fix Vulkan context shift crash * Add Vulkan to common.cpp dump_non_result_info_yaml function * Fix bug in Vulkan CPY op * Fix small matrix multiplication errors in AMD GPUs on Windows or with amdvlk Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com> --------- Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
2024-01-30Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231)Yiming Cui
2024-01-31support SYCL backend windows build (#5208)Neo Zhang Jianyu
* support SYCL backend windows build * add windows build in CI * add for win build CI * correct install oneMKL * fix install issue * fix ci * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix win build * fix win build * fix win build * restore other CI part * restore as base * rm no new line * fix no new line issue, add -j * fix grammer issue * allow to trigger manually, fix format issue * fix format * add newline * fix format * fix format * fix format issuse --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-01-30kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)Jared Van Bortel
2024-01-30Revert "server : change deps.sh xxd files to string literals (#5221)"Georgi Gerganov
This reverts commit 4003be0e5feef320f3707786f22722b73cff9356.
2024-01-30server : fix context shift (#5195)Georgi Gerganov
* server : fix context shift + simplify self-extend * server : take system_tokens into account * server : more n_past fixes * server : rever n_past_se changes
2024-01-30server : change deps.sh xxd files to string literals (#5221)JohnnyB
* Changed ugly xxd to literals. HPP files are much more readable as multiline literals rather than hex arrays. * Dashes in literal variable names. Replace . and - with _ in file names -> variable names. * Comment on removing xxd. XXD-> string literals * XXD to string literals. Replaced these unreadable headers with string literal versions using new deps.sh.
2024-01-30ggml : fix IQ3_XXS on Metal (#5219)Kawrakow
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>