OpenShift AI - 使用 NVIDIA Triton Runtime 运行模型-EW帮帮网

《OpenShift / RHEL / DevSecOps 汇总目录》
说明：本文已经在 OpenShift 4.18 + OpenShift AI 2.19 的环境中验证

文章目录

准备 Triton Runtime 环境
- 添加 Triton Serving Runtime
- 运行基于 Triton Runtime 的 Model Server
在 Triton Runtime 中运行模型
参考

准备 Triton Runtime 环境

添加 Triton Serving Runtime

进入 RHOAI 的 Settings -> Serving runtime 菜单。
点击 Add serving runtime 按钮。
在 Add serving runtime 页面中选择 Multi-model serving platform 和 REST。
在 YAML 区域点击 ‘Start from scratch’，然后提供以下内容，最后 Create。

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: triton-23.05
  labels:
    name: triton-23.05
  annotations:
    maxLoadingConcurrency: "2"
    openshift.io/display-name: Triton runtime - 25.05-py3
spec:
  supportedModelFormats:
    - name: keras
      version: "2" # 2.6.0
      autoSelect: true
    - name: onnx
      version: "1" # 1.5.3
      autoSelect: true
    - name: pytorch
      version: "1" # 1.8.0a0+17f8c32
      autoSelect: true
    - name: tensorflow
      version: "1" # 1.15.4
      autoSelect: true
    - name: tensorflow
      version: "2" # 2.3.1
      autoSelect: true
    - name: tensorrt
      version: "7" # 7.2.1
      autoSelect: true
    - name: sklearn
      version: "0" # v0.23.1
      autoSelect: false
    - name: xgboost
      version: "1" # v1.1.1
      autoSelect: false
    - name: lightgbm
      version: "3" # v3.2.1
      autoSelect: false
  protocolVersions:
    - grpc-v2
  multiModel: true
  grpcEndpoint: port:8085
  grpcDataEndpoint: port:8001
  volumes:
    - name: shm
      emptyDir:
        medium: Memory
        sizeLimit: 2Gi
  containers:
    - name: triton
      image: nvcr.io/nvidia/tritonserver:25.05-py3
      command:
        - /bin/sh
      args:
        - -c
        - 'mkdir -p /models/_triton_models;
          chmod 777 /models/_triton_models;
          exec tritonserver
          "--model-repository=/models/_triton_models"
          "--model-control-mode=explicit"
          "--strict-model-config=false"
          "--strict-readiness=false"
          "--allow-http=true"
          "--allow-sagemaker=false"'
      volumeMounts:
        - name: shm
          mountPath: /dev/shm
      resources:
        requests:
          cpu: 500m
          memory: 1Gi
        limits:
          cpu: "5"
          memory: 1Gi
      livenessProbe:
        exec:
          command:
            - curl
            - --fail
            - --silent
            - --show-error
            - --max-time
            - "9"
            - http://localhost:8000/v2/health/live
        initialDelaySeconds: 5
        periodSeconds: 30
        timeoutSeconds: 10
  builtInAdapter:
    serverType: triton
    runtimeManagementPort: 8001
    memBufferBytes: 134217728
    modelLoadingTimeoutMillis: 90000

运行基于 Triton Runtime 的 Model Server

在一个 RHOAI 项目中为 Models 设为 Multi-model serving platform 类型。
按下图在 Models 中运行一个基于 Triton 运行时的 Model Server。
完成后可以查看 Triton Model Server 的运行情况。

$ oc get deploy
NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
modelmesh-serving-triton-model-server   1/1     1            1           24h

在这里插入图片描述

在 Triton Runtime 中运行模型

准备模型运行环境

在 RHOAI 中创建一个项目，然后在 Models 中选择 ‘Select multi-model’。
确保在对象存储中有名为 ai-models 的存储桶。
创建一个名为 ai-models 的 Connection，连到对象存储中名为 ai-models 的存储桶。

运行 PyTorch 模型

将 modelmesh-minio-examples 下载到本地，然后查看位于 modelmesh-minio-examples/pytorch/cifar 目录下所包含的文件。

$ git clone https://github.com/kserve/modelmesh-minio-examples && cd modelmesh-minio-examples/pytorch
$ tree cifar/
cifar/
├── 1
│   └── model.pt
└── config.pbtxt

将 modelmesh-minio-examples/pytorch/cifar 目录上传到对象存储中的 ai-models 存储桶中。
在 RHOAI 中的 Models 页面里点击 Triton Model Server 一行右侧的 Deploy model 按钮，然后按下图部署位于对象存储中的 cifar 模型。
完成后可以看到 cifar-triton-torch 模型的部署状态。
查询模型的 input 和 output 格式。

$ MODEL_NAME=cifar-triton-torch
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{
  "name": "cifar-triton-torch__isvc-9f77f26bf2",
  "versions": [
    "1"
  ],
  "platform": "pytorch_libtorch",
  "inputs": [
    {
      "name": "INPUT__0",
      "datatype": "FP32",
      "shape": [
        "-1",
        "3",
        "32",
        "32"
      ]
    }
  ],
  "outputs": [
    {
      "name": "OUTPUT__0",
      "datatype": "FP32",
      "shape": [
        "-1",
        "10"
      ]
    }
  ]
}

下载测试数据文件，然后提交给 cifar-triton-torch 模型，得到返回结果。

$ wget https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/triton/torchscript/input.json
$ curl -s -X POST -k "${MODEL_URL}/infer" -H "Content-Type: application/json" -d @./input.json | jq
{
  "model_name": "cifar-triton-torch__isvc-9f77f26bf2",
  "model_version": "1",
  "outputs": [
    {
      "name": "OUTPUT__0",
      "datatype": "FP32",
      "shape": [
        1,
        10
      ],
      "data": [
        -0.55252016,
        -1.7675304,
        0.6265609,
        1.4070208,
        0.38794953,
        1.3849527,
        -0.16314837,
        0.85409915,
        -0.6349715,
        -0.6840154
      ]
    }
  ]
}

运行 ONNX 模型

下载 https://ai-on-openshift.io/odh-rhoai/img-triton/card.fraud.detection.onnx 模型文件到本地。
将模型文件上传到对象存储的 ai-models 存储桶下的 card-fraud-detection 文件夹中。
按下图将 card-fraud-detection.onnx 部署模型到 Triton Model Server 中。
查看部署状态。
查询模型的 input 和 output 格式。

$ MODEL_NAME=card-fraud-detection
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{
  "name": "card-fraud-detection-1__isvc-c0a9fa30b8",
  "versions": [
    "1"
  ],
  "platform": "onnxruntime_onnx",
  "inputs": [
    {
      "name": "dense_input",
      "datatype": "FP32",
      "shape": [
        "-1",
        "7"
      ]
    }
  ],
  "outputs": [
    {
      "name": "dense_3",
      "datatype": "FP32",
      "shape": [
        "-1",
        "1"
      ]
    }
  ]
}

访问 card-fraud-detection 模型。

$ curl -s -X POST -k "${MODEL_URL}/infer" -d '{"inputs": [{ "name": "dense_input", "shape": [1, 7], "datatype": "FP32", "data": [57.87785658389723,0.3111400080477545,1.9459399775518593,1.0,1.0,0.0,0.0]}]}' | jq
{
  "model_name": "card-fraud-detection__isvc-7bda50d09c",
  "model_version": "1",
  "outputs": [
    {
      "name": "dense_3",
      "datatype": "FP32",
      "shape": [
        1,
        1
      ],
      "data": [
        0.86280495
      ]
    }
  ]
}

运行 TensorFlow 模型

将 modelmesh-minio-examples 下载到本地，然后查看位于 modelmesh-minio-examples/tensorflow 目录下所包含的文件。

$ git clone https://github.com/kserve/modelmesh-minio-examples && cd modelmesh-minio-examples
$ tree tensorflow
tensorflow
+--- mnist
|   +--- saved_model.pb
|   +--- variables
|   |   +--- variables.data-00000-of-00001
|   |   +--- variables.index
+--- simple_string
|   +--- 1
|   |   +--- model.graphdef
|   +--- config.pbtxt

将 tensorflow 目录上传到对象存储的 ai-models 存储桶中。
在 RHOAI 中按下图部署模型，分别使用 tensorflow/mnist 和 tensorflow/simple_string 作为部署模型的 Path。
完成后可以看到部署好的 mnist 和 simplestring 模型。
查询每个模型的 input 和 output 格式。

$ MODEL_NAME=mnist
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{
  "name": "mnist__isvc-a18e6fe55d",
  "versions": [
    "1"
  ],
  "platform": "tensorflow_savedmodel",
  "inputs": [
    {
      "name": "inputs",
      "datatype": "FP32",
      "shape": [
        "-1",
        "784"
      ]
    }
  ],
  "outputs": [
    {
      "name": "classes",
      "datatype": "INT64",
      "shape": [
        "-1",
        "1"
      ]
    }
  ]
}

将以下内容保存到 mnist-test.json 文件中。

{
  "inputs": [{
    "name": "inputs",
    "shape": [1, 784],
    "datatype": "FP32",
    "data": [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2392, 0.0118, 0.1647, 0.4627, 0.7569, 0.4627, 0.4627, 0.2392, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0549, 0.7020, 0.9608, 0.9255, 0.9490, 0.9961, 0.9961, 0.9961, 0.9961, 0.9608, 0.9216, 0.3294, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.5922, 0.9961, 0.9961, 0.9961, 0.8353, 0.7529, 0.6980, 0.6980, 0.7059, 0.9961, 0.9961, 0.9451, 0.1804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1686, 0.9216, 0.9961, 0.8863, 0.2510, 0.1098, 0.0471, 0.0000, 0.0000, 0.0078, 0.5020, 0.9882, 1.0000, 0.6784, 0.0667, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2196, 0.9961, 0.9922, 0.4196, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.5255, 0.9804, 0.9961, 0.2941, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2471, 0.9961, 0.6196, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8667, 0.9961, 0.6157, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.7608, 0.9961, 0.4039, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.5882, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1333, 0.8627, 0.9373, 0.2275, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.4941, 0.9961, 0.6706, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.9373, 0.2353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0431, 0.8588, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3843, 0.9961, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.6353, 0.9961, 0.8196, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3843, 0.9961, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.9333, 0.9961, 0.2941, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3843, 0.9961, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.6471, 0.9961, 0.7647, 0.0157, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2588, 0.9451, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0118, 0.6549, 0.9961, 0.8902, 0.2157, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.8353, 0.0784, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1804, 0.5961, 0.7922, 0.9961, 0.9961, 0.2471, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.9961, 0.8000, 0.7059, 0.7059, 0.7059, 0.7059, 0.7059, 0.9216, 0.9961, 0.9961, 0.9176, 0.6118, 0.0392, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3176, 0.8039, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.9882, 0.9176, 0.4706, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1020, 0.8235, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.6000, 0.4078, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
  }]
}

将测试文件提交给 mnist 模型，得到返回结果。

$ curl -s -X POST -k "${MODEL_URL}/infer" -H "Content-Type: application/json" -d @./test.json | jq
{
  "model_name": "mnist__isvc-a18e6fe55d",
  "model_version": "1",
  "outputs": [
    {
      "name": "classes",
      "datatype": "INT64",
      "shape": [
        1,
        1
      ],
      "data": [
        0
      ]
    }
  ]
}

参考

https://github.com/kserve/modelmesh-serving/blob/main/config/runtimes/triton-2.x.yaml
https://github.com/kserve/modelmesh-minio-examples/tree/main/pytorch/cifar

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tutorials/Quick_Deploy/PyTorch/README.html
https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1150/user-guide/docs/model_repository.html#pytorch-models
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags

https://github.com/triton-inference-server/tutorials/tree/main/Conceptual_Guide/Part_1-model_deployment
https://github.com/triton-inference-server/server/tree/main/docs/examples

https://ai-on-openshift.io/odh-rhoai/custom-runtime-triton
https://ai-on-openshift.io/tools-and-applications/ensemble-serving/ensemble-serving/
https://ai-on-openshift.io/odh-rhoai/custom-runtime-triton/#deploying-a-model-into-it
https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing
https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing/blob/main/runtime/runtime-rest.yaml

https://kserve.github.io/website/latest/modelserving/v1beta1/triton/torchscript/

OpenShift AI - 使用 NVIDIA Triton Runtime 运行模型

文章目录

准备 Triton Runtime 环境

添加 Triton Serving Runtime

运行基于 Triton Runtime 的 Model Server

在 Triton Runtime 中运行模型

准备模型运行环境

运行 PyTorch 模型

运行 ONNX 模型

运行 TensorFlow 模型

参考

网站公告

今日签到

热门文章

最新发布