summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2024-02-03refactor : switch to emplace_back to avoid extra object (#5291)Michael Klimenko
2024-02-03YaRN : store rope scaling type as int32_t in memory (#5285)Jared Van Bortel
2024-02-03readme : add tenere in the ui tools list (#5284)BADR
2024-02-03Fix im2col with 32fp (#5286)AidanBeltonS
2024-02-02perplexity : fix KL divergence calculations on Windows (#5273)kalomaze
2024-02-02scripts : parse wtype in server-llm.sh (#5167)Georgi Gerganov
2024-02-02py : add check for '.attn.masked_bias' layers to GPT2model (#5281)Mirror Azure
2024-02-02Tidy ggml-sycl (#5261)AidanBeltonS
2024-02-02docker : add build for SYCL, Vulkan + update readme (#5228)Xuan Son Nguyen
2024-02-02[SYCL] get MAX_MEM_ALLOC from device property (#5270)Meng, Hengyu
2024-02-02[SYCL] update guide of SYCL backend (#5254)Neo Zhang Jianyu
2024-02-02llama : fix memory leak in llama_batch_free (#5252)Ian Bull
2024-02-01add --no-mmap in llama-bench (#5257)Neo Zhang Jianyu
2024-02-01Vulkan Phi Fix for AMD Proprietary Drivers (#5260)0cc4m
2024-02-01cuda : fix LLAMA_CUDA_F16 (#5262)slaren
2024-02-01make : generate .a library for static linking (#5205)Ali Nehzat
2024-02-01llama : support InternLM2 (#5184)Guoteng
2024-01-31Fix broken Vulkan Cmake (properly) (#5230)Eve
2024-01-31llama : reorder build_orion() at correct place (#5118)Georgi Gerganov
2024-01-31llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)Georgi Gerganov
2024-01-31metal : add im2col F32 dst support (#5132)Georgi Gerganov
2024-01-31llava : add MobileVLM support (#5132)JidongZhang-THU
2024-01-31format license text, restore apache license by legal suggestion (#5233)Neo Zhang Jianyu
2024-01-31ggml : limit n_threads to the max n_tasks (#5238)slaren
2024-01-31Vulkan Fixes (#5223)0cc4m
2024-01-30Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231)Yiming Cui
2024-01-31support SYCL backend windows build (#5208)Neo Zhang Jianyu
2024-01-30kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)Jared Van Bortel
2024-01-30Revert "server : change deps.sh xxd files to string literals (#5221)"Georgi Gerganov
2024-01-30server : fix context shift (#5195)Georgi Gerganov
2024-01-30server : change deps.sh xxd files to string literals (#5221)JohnnyB
2024-01-30ggml : fix IQ3_XXS on Metal (#5219)Kawrakow
2024-01-30sync : ggml (#0)Georgi Gerganov
2024-01-30gguf : fix comparison (ggml/715)Georgi Gerganov
2024-01-30`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)John Balis
2024-01-30gguf : add input validation, prevent integer overflows (ggml/709)Georgi Gerganov
2024-01-30ci : fix yolo URLs + fix metal capture (ggml/712)Georgi Gerganov
2024-01-30metal : add debug capture backend function (ggml/694)Jack Mousseau
2024-01-30Faster AVX2 dot product for IQ2_XS (#5187)Kawrakow
2024-01-30SOTA 3-bit quants (#5196)Kawrakow
2024-01-30Vulkan Windows APU Memory Handling (#5199)0cc4m
2024-01-30quantize : fix typo (#5211)Vladimir Malyutin
2024-01-30main : allow empty --prompt-cache file (#5176)divinity76
2024-01-30readme : minor (#5204)Romain Neutron
2024-01-30readme : update hot topicsGeorgi Gerganov
2024-01-30server : improve README (#5209)Wu Jian Ping
2024-01-29ggml alloc: Fix for null dereference on alloc failure (#5200)Paul Tsochantaris
2024-01-29kompute : fix fallback to CPU (#5201)Jared Van Bortel
2024-01-29Nomic Vulkan backend (#4456)Jared Van Bortel
2024-01-29fix typo "RLIMIT_MLOCK" (#5175)divinity76