diff options
author | Kawrakow <iwankawrakow@gmail.com> | 2025-04-29 07:19:43 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-04-29 07:19:43 +0200 |
commit | cda24b58cbef34154651d0083910fed860a506c1 (patch) | |
tree | 90cd3bd7f772c3b240a6553eca5e50edf95c53da /ggml/src/iqk/iqk_flash_impl.h | |
parent | baeefb4731fb24cdace168f6dbc74516d470efc0 (diff) |
CPU FA improvements (#351)
* FA: provide work buffer for K repacking
* Add header to avoid comp0iler warnings
* WIP
* WIP
* WIP
* WIP
* Slightly better
* WIP (Zen4)
* WIP
* Try to improve for unusual number of heads/number of threads
* Use mul_mat_qX_0_q8_2_Tx for q6_0 in FA
* Use mul_mat_qX_0_q8_2_Tx for q4_0 in FA
* Use Sum4q4 for q4_0
* WIP
* WIP
* Much better FA TG with q8_0 KV cache
Just repack it even for TG. But do the repacking for k_step rows,
not the whole K tensor.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml/src/iqk/iqk_flash_impl.h')
-rw-r--r-- | ggml/src/iqk/iqk_flash_impl.h | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/ggml/src/iqk/iqk_flash_impl.h b/ggml/src/iqk/iqk_flash_impl.h index 68802927..6f62e56b 100644 --- a/ggml/src/iqk/iqk_flash_impl.h +++ b/ggml/src/iqk/iqk_flash_impl.h @@ -6,6 +6,8 @@ #pragma once +#include <cstdint> + bool iqk_flash_attn_impl(int type_k, // type of k int type_v, // type of v int Dk, // K head size @@ -27,3 +29,5 @@ bool iqk_flash_attn_impl(int type_k, // type of k float * M, float * S); +void * iqk_repack_k(int type_k, int nek0, int nek1, int nek2, int nek3, long nbk1, long nbk2, long nbk3, + const void * k, void * work, int ith, int nth, int& repacked_type, uint64_t& row_size); |