OpenVINO使用教程--CPU/GPU/NPU加速对比-EW帮帮网

OpenVINO使用教程--CPU/GPU/NPU加速对比

本节内容

本节内容

OpenVINO 支持多种推理设备，在本专栏第一节中有过介绍：OpenVINO使用教程–简述，OpenVINO主要还是支持intel 的硬件，包括intel GPU/NPU/CPU，本节主要介绍OpenVINO使用intel CPU/GPU/NPU加速对比。
我一直以为相同模型推理速度：NPU>GPU>CPU，但是在下面的测试发现，实际推理速度并不一定是这样。

分割模型测试

示例来自openvino_notebooks代码仓库的示例hello-segmentation：

import cv2
import matplotlib.pyplot as plt
import numpy as np
import openvino as ov
from pathlib import Path
import time

model_xml_path = "./model/road-segmentation-adas-0001.xml"
device = "CPU"

core = ov.Core()
model = core.read_model(model=model_xml_path)
compiled_model = core.compile_model(model=model, device_name=device)

input_layer_ir = compiled_model.input(0)
output_layer_ir = compiled_model.output(0)

image_filename = Path("data/empty_road_mapillary.jpg")

# The segmentation network expects images in BGR format.
image = cv2.imread(str(image_filename))

rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_h, image_w, _ = image.shape

# N,C,H,W = batch size, number of channels, height, width.
N, C, H, W = input_layer_ir.shape

# OpenCV resize expects the destination size as (width, height).
resized_image = cv2.resize(image, (W, H))

# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

st = time.time()
for i in range(100):
# Run the inference.
	result = compiled_model([input_image])[output_layer_ir]
ed = time.time()
elapsed_tm = ed - st
print('elapsed time: ', elapsed_tm)

# Prepare data for visualization.
segmentation_mask = np.argmax(result, axis=1)

上面的代码运行会打印100次推理使用的时间，将device改成GPU或者NPU则会计算相应设备推理时间。在我的设备上推理时间对比：

CPU: 1.354 s
GPU: 2.132 s
NPU: 3.317 s

这里的结果很反直觉，推理速度CPU>GPU>NPU！

检测模型测试

示例来自openvino_notebooks代码仓库的示例yolov12-optimization：

from pathlib import Path
from PIL import Image
from ultralytics import YOLO
import openvino as ov
import time

IMAGE_PATH = Path("./data/coco_bike.jpg")
DET_MODEL_NAME = "yolo12m"
device = "CPU"

det_model_path = Path(f"{DET_MODEL_NAME}_openvino_model/{DET_MODEL_NAME}.xml")

core = ov.Core()
det_ov_model = core.read_model(det_model_path)

ov_config = {}
if device != "CPU":
    det_ov_model.reshape({0: [1, 3, 640, 640]})
if "GPU" in device or ("AUTO" in device and "GPU" in core.available_devices):
    ov_config = {"GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
det_compiled_model = core.compile_model(det_ov_model, device, ov_config)
det_model = YOLO(det_model_path.parent, task="detect")

if det_model.predictor is None:
    custom = {"conf": 0.25, "batch": 1, "save": False, "mode": "predict"}  # method defaults
    args = {**det_model.overrides, **custom}
    det_model.predictor = det_model._smart_load("predictor")(overrides=args, _callbacks=det_model.callbacks)
    det_model.predictor.setup_model(model=det_model.model)
    det_model.predictor.model.dynamic = False

det_model.predictor.model.ov_compiled_model = det_compiled_model

st = time.time()
for i in range(100):
	res = det_model(IMAGE_PATH)
ed = time.time()
print("time cost: ", ed - st)

上面的代码运行会打印100次推理使用的时间，将device改成GPU或者NPU则会计算相应设备推理时间。在我的设备上推理时间对比：

NPU:  62.9 ms
GPU:  50.6 ms
CPU:  262.2 ms

上面的结果和运行代码打印的结果是不一样的，因为这个det_model这个接口包含模型前后处理，我们从打印的推理时间统计中，将推理部分的时间大致统计出来，得到上面的结果。这里得到的结果看起来更符合预期一些：GPU>NPU>CPU。

分析&总结

分割模型的例子里面用的分割模型非常小，大小不足1M，检测模型大小在40M，所以分割例子的结果比较反常，可能是模型太小以至于完全没有体现出其他两个设备推理的优势。
检测模型的例子得到的结果是GPU推理速度略快于NPU，这个结果也是比较出乎我的意料的，这样的话NPU似乎没啥优势，当然这里只测试了两个小模型，结论并不科学。
在测试的时候发现，相同的例子在NPU上可能会报错，最后在众多的例子中选了这两个示例，可能我的设备NPU还不够强，反正没有想像中的好用。

OpenVINO使用教程--CPU/GPU/NPU加速对比

OpenVINO使用教程--CPU/GPU/NPU加速对比

本节内容

分割模型测试

检测模型测试

分析&总结

网站公告

今日签到

热门文章

最新发布