集成 OpenTelemetry + Grafana:实现 ABP VNext 的全链路可观测性

发布于:2025-05-21 ⋅ 阅读:(15) ⋅ 点赞:(0)

集成 OpenTelemetry + Grafana:实现 ABP VNext 的全链路可观测性

在现代微服务架构中,可观测性(Observability) 是保障系统稳定性和性能的核心能力。本文以生产级配置为目标,在 ABP VNext 中一键落地 OpenTelemetry + Collector + Grafana,覆盖端到端的追踪、指标、日志关联与告警。



🧠 背景知识

  • OpenTelemetry
    开源可观测性框架,支持分布式追踪、指标(Metrics)与日志(Logs)的统一采集与导出。

  • Grafana
    功能强大的可视化平台,支持多种数据源(Prometheus、Tempo、Elasticsearch 等),用于面板展示、告警规则与 Service Map。

  • ABP VNext
    基于 ASP.NET Core 的模块化应用框架,天然支持中间件扩展与依赖注入,适合集成 OpenTelemetry。


🏗️ 整体架构设计

User ABP App OTel SDK OTel Collector Prometheus Tempo Grafana HTTP Request Trace + Metrics + Logs OTLP (gRPC/HTTP) Metrics (Prometheus) Traces (OTLP) Query /metrics Query /traces User ABP App OTel SDK OTel Collector Prometheus Tempo Grafana

部署拓扑

AppCluster
Infra
OTLP gRPC/HTTP
OTLP gRPC/HTTP
Metrics→
Traces→
Read Metrics
Read Traces
ABP App Pod 1
ABP App Pod 2
Prometheus
Tempo
OTel Collector
Grafana

🔧 环境准备

  • .NET SDK
  • ABP vNext
  • Docker & Docker Compose

NuGet 包依赖

dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.Runtime
dotnet add package OpenTelemetry.Instrumentation.EntityFrameworkCore
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Exporter.Prometheus
dotnet add package OpenTelemetry.Extensions.Logging

环境配置文件

appsettings.Development.json
{
  "OpenTelemetry": {
    "OtlpEndpoint": "http://collector:4317",
    "SamplerRatio": 1.0
  }
}
appsettings.Production.json
{
  "OpenTelemetry": {
    "OtlpEndpoint": "https://otel-collector.internal:4317",
    "SamplerRatio": 0.1,
    "AuthToken": "${OTEL_EXPORTER_OTLP_HEADERS}"
  }
}

Docker Compose(含 Collector)

version: '3.8'
services:
  collector:
    image: otel/opentelemetry-collector:0.76.0
    volumes:
      - ./otel-collector-config.yaml:/etc/otel/config.yaml
    command: ["--config", "/etc/otel/config.yaml"]
    ports:
      - "4317:4317"
      - "8888:8888"

  prometheus:
    image: prom/prometheus:v2.45.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  tempo:
    image: grafana/tempo:2.4.1
    ports:
      - "3200:3200"

  grafana:
    image: grafana/grafana:10.0.3
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  prometheus:
    endpoint: "0.0.0.0:8888"
  otlp:
    endpoint: "${OTEL_EXPORTER_OTLP_ENDPOINT}"
    headers:
      "Authorization": "Bearer ${OTEL_EXPORTER_OTLP_HEADERS}"
    tls:
      ca_file: "/etc/otel/ca.crt"
      cert_file: "/etc/otel/client.crt"
      key_file: "/etc/otel/client.key"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      exporters: [prometheus]

安全建议

  • Collector Service 在 Kubernetes 中使用 ClusterIP,前端服务通过 mTLS 或 Token 访问;
  • 限制端口 4317/8888 在内部网络,避免公网暴露。

🚀 集成步骤

1️⃣ 在 ABP 模块中统一注册

public override void PreConfigureServices(ServiceConfigurationContext context)
{
    var services      = context.Services;
    var configuration = services.GetConfiguration();

    // —— Tracing ——
    services.AddOpenTelemetryTracing(builder =>
    {
        builder
            .SetResourceBuilder(ResourceBuilder.CreateDefault()
                .AddService("MyApp", serviceVersion: "1.0.0"))
            .AddAspNetCoreInstrumentation(opts => opts.RecordException = true)
            .AddHttpClientInstrumentation()
            .AddRuntimeInstrumentation()
            .AddEntityFrameworkCoreInstrumentation()
            .AddSource("MyApp")
            .AddOtlpExporter(opt =>
            {
                opt.Endpoint = new Uri(configuration["OpenTelemetry:OtlpEndpoint"]!);
                if (configuration["OpenTelemetry:AuthToken"] is string token)
                    opt.Headers = new MetadataCollection { { "Authorization", $"Bearer {token}" } };
            })
            .SetSampler(new TraceIdRatioBasedSampler(
                Convert.ToDouble(configuration["OpenTelemetry:SamplerRatio"])));
    });

    // —— Metrics ——
    services.AddMeter("MyAppMetrics");
    services.AddOpenTelemetryMetrics(builder =>
    {
        builder
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddRuntimeInstrumentation()
            .AddMeter("MyAppMetrics")
            .AddPrometheusExporter();
    });

    // —— Logs ——
    services.AddLogging(logging =>
    {
        logging.AddOpenTelemetry(options =>
        {
            options.IncludeFormattedMessage = true;
            options.ParseStateValues      = true;
            options.AddOtlpExporter(opt =>
            {
                opt.Endpoint = new Uri(configuration["OpenTelemetry:OtlpEndpoint"]!);
            });
        });
    });
}

流程图:初始化与注册顺序

模块加载 PreConfigureServices
获取 IConfiguration
注册 Tracing
注册 Metrics
注册 Logs
模块启动完成

2️⃣ 自定义追踪与指标

// 注入 ActivitySource
services.AddSingleton<ActivitySource>(_ => new ActivitySource("MyApp"));

// 业务服务中使用
public class OrderService
{
    private readonly ActivitySource _activitySource;
    public OrderService(ActivitySource activitySource) => _activitySource = activitySource;

    public async Task ProcessAsync()
    {
        using var activity = _activitySource.StartActivity("Order.Process", ActivityKind.Server);
        activity?.SetTag("order.id", Guid.NewGuid().ToString());
        // …业务逻辑…
    }
}

// 自定义指标示例
var meter        = new Meter("MyAppMetrics", "1.0.0");
var orderCounter = meter.CreateCounter<long>("order.processed.count");
orderCounter.Add(1, new("status", "success"));

3️⃣ 在 Program.cs 中暴露 Metrics 端点

app.UseRouting();

app.UseOpenTelemetryPrometheusScrapingEndpoint("/metrics");

app.UseAuthentication();
app.UseAuthorization();

app.UseEndpoints(endpoints =>
{
    endpoints.MapControllers();
});

📊 Grafana 可视化与告警

  • 数据源

    • Tempo(Traces),Prometheus(Metrics)
  • 推荐 Dashboard

    1. HTTP 延迟直方图
    2. Service Map(Trace Explorer)
    3. 自定义业务指标(订单处理速率、成功率)
  • 告警示例

    # 95% 请求延迟超过 500ms
    histogram_quantile(0.95, sum(rate(request_duration_seconds_bucket[5m])) by (le)) > 0.5
    
    # 5xx 错误率超过 1%
    sum(rate(request_duration_seconds_count{status=~"5.."}[5m]))
      / sum(rate(request_duration_seconds_count[5m])) > 0.01
    

流程图:告警触发流程

PromQL Evaluation
Send Alert
Acknowledgment
Prometheus
AlertManager
Email/Slack/Webhook
DevOpsTeam

📦 一键启动与项目说明

  1. 启动脚本 start.sh(项目根目录)
   #!/usr/bin/env bash
   cp appsettings.Development.json appsettings.json
   docker-compose up -d
  1. README.md
    • 切换环境:编辑 appsettings.{Environment}.json
    • Collector 配置:otel-collector-config.yaml
    • Dashboard 存放:grafana/dashboards/*.json
    • 压测示例:
     siege -c10 -t30S http://localhost:5000/api/health
     wrk -c10 -t2 -d30s http://localhost:5000/api/health