index
:
ik_llama.cpp.git
main
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml-cuda.cu
Age
Commit message (
Expand
)
Author
2024-01-03
cuda : simplify expression
Georgi Gerganov
2024-01-03
cuda : mark I16 and I32 ops as unsupported
Georgi Gerganov
2023-12-30
CUDA: fixed tensor cores not being used on RDNA3 (#4697)
Johannes Gäßler
2023-12-29
CUDA: fix tensor core logic for Pascal and HIP (#4682)
Johannes Gäßler
2023-12-29
cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687)
hydai
2023-12-29
ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669)
bssrdf
2023-12-26
cuda : fix vmm pool with multi GPU (#4620)
slaren
2023-12-26
Fix new CUDA10 compilation errors (#4635)
FantasyGmm
2023-12-24
cuda : improve cuda pool efficiency using virtual memory (#4606)
slaren
2023-12-23
fallback to CPU buffer if host buffer alloc fails (#4610)
slaren
2023-12-23
CUDA: fixed row rounding for 0 tensor splits (#4594)
Johannes Gäßler
2023-12-22
sync : ggml (fix im2col) (#4591)
Georgi Gerganov
2023-12-22
cuda : fix jetson compile error (#4560)
FantasyGmm
2023-12-22
Fix CudaMemcpy direction (#4599)
Henrik Forstén
2023-12-22
llama : fix platforms without mmap (#4578)
slaren
2023-12-21
ggml : change ggml_scale to take a float instead of tensor (#4573)
Georgi Gerganov
2023-12-21
llama : initial ggml-backend integration (#4520)
slaren
2023-12-21
cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)
Erik Garrison
2023-12-21
ggml-cuda: Fix HIP build by adding define for __trap (#4569)
arlo-phoenix
2023-12-21
CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)
Johannes Gäßler
2023-12-21
cuda : better error message for ggml_get_rows (#4561)
bobqianic
2023-12-21
cuda : replace asserts in wrong architecture checks with __trap (#4556)
slaren
2023-12-21
Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554)
LoganDark
2023-12-20
CUDA: Faster Mixtral prompt processing (#4538)
Johannes Gäßler
2023-12-18
ggml-cuda: Fix HIP build (#4528)
arlo-phoenix
2023-12-18
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)
Ebey Abraham
2023-12-14
ggml : use ggml_row_size where possible (#4472)
slaren
2023-12-13
sync : ggml (SD ops, tests, kernels) (#4444)
Georgi Gerganov
2023-12-13
llama : add Mixtral support (#4406)
slaren
2023-12-07
sync : ggml (new ops, tests, backend, etc.) (#4359)
Georgi Gerganov
2023-12-07
llama : per-layer KV cache + quantum K cache (#4309)
Georgi Gerganov
2023-12-01
ggml : add ggml_soft_max_ext (#4256)
Georgi Gerganov
2023-11-24
ggml-cuda : support stablelm rope (#4156)
slaren
2023-11-23
Fix incorrect format strings and uninitialized variables. (#4133)
Haohui Mai
2023-11-18
Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)
Kerfuffle
2023-11-17
cuda : get_row_rounding F32 (#4095)
Andrew Godfrey
2023-11-17
llama : fix data units (#4101)
Georgi Gerganov
2023-11-15
ggml-cuda : increase max graph size (#4084)
slaren
2023-11-13
ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)
Georgi Gerganov
2023-11-13
sync : ggml (backend v2) (#3912)
Georgi Gerganov
2023-11-13
Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)
Kerfuffle
2023-11-07
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
Meng Zhang
2023-11-05
ggml-cuda : fix f16 mul mat (#3961)
slaren
2023-11-05
cuda : fix disabling device with --tensor-split 1,0 (#3951)
Jared Van Bortel
2023-11-05
cuda : revert CUDA pool stuff (#3944)
slaren
2023-11-03
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)
slaren
2023-11-02
cuda : add ROCM aliases for CUDA pool stuff (#3918)
Kerfuffle
2023-11-02
cuda : fix const ptrs warning causing ROCm build issues (#3913)
Georgi Gerganov
2023-11-02
cuda : use CUDA memory pool with async memory allocation/deallocation when av...
Oleksii Maryshchenko
2023-11-02
cuda : check if this fixes Pascal card regression (#3882)
Georgi Gerganov
[next]