环境
os:Linux devserver-jsrh 4.19.36-vhulk1907.1.0.h1438.eulerosv2r8.aarch64
Python 3.12.7
transformers==4.51.0
# cat /usr/local/Ascend/firmware/version.info
Version=7.5.0.2.220
firmware_version=1.0
package_version=24.1.0
# cat /usr/local/Ascend/driver/version.info
Version=24.1.0
ascendhal_version=7.35.23
NPU: 910PremiumA
操作过程
对DeepSeek-R1-Distill-Qwen-32B进行量化,方法参考:https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1-Distill-Qwen-32B
量化命令
# cd msit/msmodelslim/example/Qwen
# python3 quant_qwen.py --model_path /home/models/DeepSeek-R1-Distill-Qwen-32B/ --save_directory /home/models/quant/DeepSeek-R1-Distill-Qwen-32B-quant/ --calib_file ../common/boolq.jsonl --w_bit 8 --a_bit 8 --device_type npu
量化后的文件和修改
# ls quant/DeepSeek-R1-Distill-Qwen-32B-quant/ -lh
total 41G
-rw------- 1 root root 714 Jun 23 17:33 config.json
-rw-r--r-- 1 root root 181 Jun 23 17:33 generation_config.json
-rw------- 1 root root 168K Jun 23 17:33 quant_model_description.json
-r-------- 1 root root 41G Jun 23 17:33 quant_model_weight_w8a8.safetensors
-rw-r--r-- 1 root root 3.0K Jun 23 17:33 tokenizer_config.json
-rw-r--r-- 1 root root 6.8M Jun 23 17:33 tokenizer.json
如果直接使用会报错,需要修改config.json,在里面增加下面2个变量:
"quantize": "w8a8",
"quant_config_path": "/home/models/quant/Qwen2-7B-Instruct-quant/quant_model_description.json"
如果嫌绝对路径麻烦,不加"quant_config_path",则需要将quant_model_description.json重命名为quant_model_description_w8a8.json
报错
报错1
推理的时候报错:
File "/usr/local/Ascend/atb-models/atb_llm/utils/layers/linear/fast_linear.py", line 37, in __init__
raise ValueError("linear type not matched, please check `config.json` `quantize` parameter")
ValueError: linear type not matched, please check `config.json` `quantize` parameter
表面意思是config.json缺少quantize这个变量。
解决办法:在模型路径下的config.json增加quantize类型配置
"quantize": "w8a8"
报错2
前面一个问题解决后又出来一个问题
2025-06-23 13:41:47,437 [ERROR] model.py:39 - [Model] >>> Exception:
Traceback (most recent call last):
File "/usr/local/Ascend/atb-models/atb_llm/utils/weights.py", line 841, in _set_quant_params
with file_utils.safe_open(filename, 'r') as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Ascend/atb-models/atb_llm/utils/file_utils.py", line 40, in safe_open
check_file_safety(file_path, mode, is_exist_ok, max_file_size)
File "/usr/local/Ascend/atb-models/atb_llm/utils/file_utils.py", line 148, in check_file_safety
raise FileNotFoundError("The file is expected to exist, but it does not. "
FileNotFoundError: The file is expected to exist, but it does not. Please check the input path:/home/models/quant/Qwen2-7B-Instruct-quant/quant_model_description_w8a8.json
意思是运行代码找不到quant_model_description_w8a8.json这个文件,但是目录下是有quant_model_description.json这个文件的。
解决办法:将quant_model_description.json重命名为quant_model_description_w8a8.json