在910A上量化大语言模型问题记录-EW帮帮网

环境

os:Linux devserver-jsrh 4.19.36-vhulk1907.1.0.h1438.eulerosv2r8.aarch64 
Python 3.12.7
transformers==4.51.0
# cat  /usr/local/Ascend/firmware/version.info
Version=7.5.0.2.220
firmware_version=1.0
package_version=24.1.0
# cat  /usr/local/Ascend/driver/version.info
Version=24.1.0
ascendhal_version=7.35.23

NPU: 910PremiumA

操作过程

对DeepSeek-R1-Distill-Qwen-32B进行量化，方法参考：https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1-Distill-Qwen-32B

量化命令

# cd msit/msmodelslim/example/Qwen
# python3 quant_qwen.py --model_path  /home/models/DeepSeek-R1-Distill-Qwen-32B/  --save_directory /home/models/quant/DeepSeek-R1-Distill-Qwen-32B-quant/ --calib_file ../common/boolq.jsonl --w_bit 8 --a_bit 8 --device_type npu

量化后的文件和修改

# ls quant/DeepSeek-R1-Distill-Qwen-32B-quant/ -lh
total 41G
-rw------- 1 root root  714 Jun 23 17:33 config.json
-rw-r--r-- 1 root root  181 Jun 23 17:33 generation_config.json
-rw------- 1 root root 168K Jun 23 17:33 quant_model_description.json
-r-------- 1 root root  41G Jun 23 17:33 quant_model_weight_w8a8.safetensors
-rw-r--r-- 1 root root 3.0K Jun 23 17:33 tokenizer_config.json
-rw-r--r-- 1 root root 6.8M Jun 23 17:33 tokenizer.json

如果直接使用会报错，需要修改config.json，在里面增加下面2个变量：

 "quantize": "w8a8",
  "quant_config_path": "/home/models/quant/Qwen2-7B-Instruct-quant/quant_model_description.json"

如果嫌绝对路径麻烦，不加"quant_config_path"，则需要将quant_model_description.json重命名为quant_model_description_w8a8.json

报错

报错1

推理的时候报错：

File "/usr/local/Ascend/atb-models/atb_llm/utils/layers/linear/fast_linear.py", line 37, in __init__
    raise ValueError("linear type not matched, please check `config.json` `quantize` parameter")
ValueError: linear type not matched, please check `config.json` `quantize` parameter

表面意思是config.json缺少quantize这个变量。
解决办法：在模型路径下的config.json增加quantize类型配置

 "quantize": "w8a8"

报错2

前面一个问题解决后又出来一个问题

2025-06-23 13:41:47,437 [ERROR] model.py:39 - [Model]   >>> Exception:
Traceback (most recent call last):
  File "/usr/local/Ascend/atb-models/atb_llm/utils/weights.py", line 841, in _set_quant_params
    with file_utils.safe_open(filename, 'r') as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Ascend/atb-models/atb_llm/utils/file_utils.py", line 40, in safe_open
    check_file_safety(file_path, mode, is_exist_ok, max_file_size)
  File "/usr/local/Ascend/atb-models/atb_llm/utils/file_utils.py", line 148, in check_file_safety
    raise FileNotFoundError("The file is expected to exist, but it does not. "
FileNotFoundError: The file is expected to exist, but it does not. Please check the input path:/home/models/quant/Qwen2-7B-Instruct-quant/quant_model_description_w8a8.json

意思是运行代码找不到quant_model_description_w8a8.json这个文件，但是目录下是有quant_model_description.json这个文件的。

解决办法：将quant_model_description.json重命名为quant_model_description_w8a8.json

在910A上量化大语言模型问题记录

环境

操作过程

量化后的文件和修改

报错

报错1

报错2

网站公告

今日签到

热门文章

最新发布