diff options
author | jiez <373447296@qq.com> | 2024-04-25 18:29:35 +0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-04-25 13:29:35 +0300 |
commit | 1966eb2615242f224bf9ca939db8905ab6a174a0 (patch) | |
tree | 3da33a1b5f816723e195a4936d44c4bef2eaa06a /llama.h | |
parent | 784e11dea1f5ce9638851b2b0dddb107e2a609c8 (diff) |
quantize : add '--keep-split' to quantize model into shards (#6688)
* Implement '--keep-split' to quantize model into several shards
* Add test script
* Update examples/quantize/quantize.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Split model correctly even if tensor id is out-of-order
* Update llama_model_quantize_params
* Fix preci failures
---------
Co-authored-by: z5269887 <z5269887@unsw.edu.au>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Diffstat (limited to 'llama.h')
-rw-r--r-- | llama.h | 1 |
1 files changed, 1 insertions, 0 deletions
@@ -288,6 +288,7 @@ extern "C" { bool quantize_output_tensor; // quantize output.weight bool only_copy; // only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored bool pure; // quantize all tensors to the default type + bool keep_split; // quantize to the same number of shards void * imatrix; // pointer to importance matrix data void * kv_overrides; // pointer to vector containing overrides } llama_model_quantize_params; |