summaryrefslogtreecommitdiff
path: root/convert_hf_to_gguf.py
diff options
context:
space:
mode:
authorKawrakow <iwankawrakow@gmail.com>2025-04-29 07:19:43 +0200
committerGitHub <noreply@github.com>2025-04-29 07:19:43 +0200
commitcda24b58cbef34154651d0083910fed860a506c1 (patch)
tree90cd3bd7f772c3b240a6553eca5e50edf95c53da /convert_hf_to_gguf.py
parentbaeefb4731fb24cdace168f6dbc74516d470efc0 (diff)
CPU FA improvements (#351)
* FA: provide work buffer for K repacking * Add header to avoid comp0iler warnings * WIP * WIP * WIP * WIP * Slightly better * WIP (Zen4) * WIP * Try to improve for unusual number of heads/number of threads * Use mul_mat_qX_0_q8_2_Tx for q6_0 in FA * Use mul_mat_qX_0_q8_2_Tx for q4_0 in FA * Use Sum4q4 for q4_0 * WIP * WIP * Much better FA TG with q8_0 KV cache Just repack it even for TG. But do the repacking for k_step rows, not the whole K tensor. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'convert_hf_to_gguf.py')
0 files changed, 0 insertions, 0 deletions