cuda : improve cuda pool efficiency using virtual memory (#4606)

* cuda : improve cuda pool efficiency using virtual memory * fix mixtral * fix cmake build * check for vmm support, disable for hip ggml-ci * fix hip build * clarify granularity * move all caps to g_device_caps * refactor error checking * add cuda_pool_alloc, refactor most pool allocations ggml-ci * fix hip build * CUBLAS_TF32_TENSOR_OP_MATH is not a macro * more hip crap * llama : fix msvc warnings * ggml : fix msvc warnings * minor * minor * cuda : fallback to CPU on host buffer alloc fail * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * ensure allocations are always aligned * act_size -> actual_size --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
author: slaren <slarengh@gmail.com> 2023-12-24 14:34:22 +0100
committer: GitHub <noreply@github.com> 2023-12-24 14:34:22 +0100
commit: 5bf3953d7e9831ea22b0bc017ce97409b801ccf1 (patch)
tree: 48c0136d9943fb9cca22209894464970549c24b5 /ggml.h
parent: 708e179e8562c2604240df95a2241dea17fd808b (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/ggml.h b/ggml.h
index 338f355a..67d6bc4f 100644
--- a/ggml.h
+++ b/ggml.h
@@ -255,6 +255,8 @@
 #define GGML_UNREACHABLE() GGML_ASSERT(!"statement should not be reached")
 #elif defined(__GNUC__)
 #define GGML_UNREACHABLE() __builtin_unreachable()
+#elif defined(_MSC_VER)
+#define GGML_UNREACHABLE() __assume(0)
 #else
 #define GGML_UNREACHABLE() ((void) 0)
 #endif
author	slaren <slarengh@gmail.com>	2023-12-24 14:34:22 +0100
committer	GitHub <noreply@github.com>	2023-12-24 14:34:22 +0100
commit	5bf3953d7e9831ea22b0bc017ce97409b801ccf1 (patch)
tree	48c0136d9943fb9cca22209894464970549c24b5 /ggml.h
parent	708e179e8562c2604240df95a2241dea17fd808b (diff)