微服务的监控链路的分析(一)

发布于:2025-09-04 ⋅ 阅读:(17) ⋅ 点赞:(0)

在微服务系统中统计API QPS,需要采用分布式环境下的监控策略。以下是几种高效可靠的实现方案:

一、架构级方案(推荐)

1. API网关统一监控
graph LR
    Client -->|请求| Gateway[API网关]
    Gateway -->|路由| ServiceA[服务A]
    Gateway -->|路由| ServiceB[服务B]
    Gateway -->|指标上报| Prometheus
    Prometheus --> Grafana

实现步骤‌:

// Spring Cloud Gateway 配置
@Bean
public RouteLocator customRouteLocator(RouteLocatorBuilder builder, MeterRegistry registry) {
    return builder.routes()
        .route("service_a", r -> r.path("/api/a/**")
            .filters(f -> f.filter(new MetricsFilter(registry, "service_a")))
            .uri("lb://SERVICE-A"))
        .route("service_b", r -> r.path("/api/b/**")
            .filters(f -> f.filter(new MetricsFilter(registry, "service_b")))
            .uri("lb://SERVICE-B"))
        .build();
}

// 自定义指标过滤器
public class MetricsFilter implements GatewayFilter {
    private final Counter counter;
    
    public MetricsFilter(MeterRegistry registry, String serviceName) {
        this.counter = Counter.builder("api_requests")
            .tag("service", serviceName)
            .register(registry);
    }

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        counter.increment();
        return chain.filter(exchange);
    }
}
2. 服务网格监控(如Istio)

# PromQL查询服务QPS sum(rate(istio_requests_total{reporter="destination"}[1m])) by (destination_service, destination_workload)

二、代码级集成方案

1. Spring Boot Actuator + Micrometer
# application.yml
management:
  endpoints:
    web:
      exposure:
        include: prometheus
  metrics:
    tags:
      application: ${spring.application.name}
@RestController
public class ApiController {
    
    private final Counter counter;
    
    public ApiController(MeterRegistry registry) {
        this.counter = Counter.builder("api.calls")
            .tag("endpoint", "getData")
            .register(registry);
    }
    
    @GetMapping("/data")
    public String getData() {
        counter.increment();
        // 业务逻辑
        return "response";
    }
}

三、日志分析方案

1. ELK Stack实现

# Logstash配置 filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} %{URIPATH:api_path} %{NUMBER:status}" } } metrics { meter => [ "qps_%{api_path}" ] add_tag => [ "metric" ] ignore_older_than => 10 } }

2. 实时流处理

// Flink实时计算QPS DataStream<LogEntry> logs = env.addSource(new KafkaSource()); logs.keyBy(LogEntry::getApiPath) .window(SlidingProcessingTimeWindows.of(Time.seconds(60), Time.seconds(1))) .process(new QpsCalculator());

四、云原生方案

1. AWS CloudWatch

// CloudWatch Metrics配置 { "metrics": [ [ "AWS/ApiGateway", "Count", "ApiName", "my-api" ], [ ".", "5xxError", ".", "." ], [ ".", "4xxError", ".", "." ] ], "period": 60, "stat": "Sum" }

2. Azure Application Insights

// ASP.NET Core集成 services.AddApplicationInsightsTelemetry(o => { o.RequestCollectionOptions.TrackExceptions = true; });

五、分布式追踪系统

1. Jaeger + Prometheus

# Prometheus配置 scrape_configs: - job_name: 'jaeger' metrics_path: '/metrics' static_configs: - targets: ['jaeger:14269']

# 查询特定API的QPS sum(rate(jaeger_spans_total{operation="GET /api/data"}[1m]))

推荐组合方案

graph TD A[API网关] -->|指标| B(Prometheus) C[微服务] -->|指标| B D[服务网格] -->|指标| B B --> E[Grafana] F[日志系统] -->|日志| G(ELK/Flink) G --> E E -->|报警| H(AlertManager)

实施建议‌:

  1. 入口层监控‌:在API网关层实现基础QPS监控(覆盖80%场景)
  2. 服务级监控‌:关键服务使用Micrometer埋点
  3. 日志备份‌:使用ELK做历史数据分析
  4. 云服务整合‌:AWS/Azure/GCP原生监控工具
  5. 全链路追踪‌:Jaeger/Zipkin分析跨服务调用

关键决策点‌:

  • 网关级监控:快速部署,覆盖所有入口流量
  • 服务级监控:更精细但需要代码改动
  • 服务网格:零侵入但需要基础设施支持
  • 云服务:开箱即用,但存在厂商锁定

实际生产中推荐采用分层监控策略,结合网关级全局监控和服务级细粒度监控,在Grafana中建立统一监控看板。

对于代码集成方案的详细用法:

pom.xml
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>


application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus
  metrics:
    tags:
      application: ${spring.application.name}
    distribution:
      percentiles-histogram:
        http.server.requests: true




@RestController
public class DemoController {
    
    @GetMapping("/api/data")
    public String getData() {
        return "response";
    }
}



@Configuration
public class CustomMetrics {
    @Bean
    MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
        return registry -> registry.config().commonTags("region", "us-east");
    }
}

实现步骤说明:

  1. 通过Actuator暴露的/prometheus端点获取原始指标数据
  2. http_server_requests_seconds_count指标自动记录所有HTTP请求
  3. 使用PromQL计算QPS:rate(http_server_requests_seconds_count[1m])

查看QPS的四种方式:

方式一:Prometheus原生界面


textCopy Code

# 查询所有API的QPS sum(rate(http_server_requests_seconds_count[1m])) by (uri) # 查询特定API的QPS rate(http_server_requests_seconds_count{uri="/api/data"}[1m])

方式二:Grafana仪表盘

  1. 创建新的Dashboard
  2. 添加Panel选择Prometheus数据源
  3. 输入上述PromQL查询

方式三:Spring Boot Admin

  1. 部署Spring Boot Admin服务
  2. 在应用详情页查看Metrics标签页
  3. 搜索"http.server.requests"指标

方式四:本地开发调试


textCopy Code

curl http://localhost:8080/actuator/prometheus | grep http_server_requests

关键指标说明:

  • http_server_requests_seconds_count:请求总数计数器
  • http_server_requests_seconds_max:最大响应时间
  • http_server_requests_seconds_sum:响应时间总和

高级配置建议:

  1. 添加@Timed注解实现方法级监控
  2. 使用Micrometer的@Counted注解自定义业务指标
  3. 通过management.metrics.web.server.request.autotime.enabled开启自动计时


<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
            <version>1.6.3</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
            <version>2.3.4.RELEASE</version>
        </dependency>
    </dependencies>
</dependencyManagement>



management:
  endpoints:
    web:
      exposure:
        include: "*"
  metrics:
    tags:
      application: ${spring.application.name}


scrape_configs:
  - job_name: 'nacos-service-discovery'
    metrics_path: '/actuator/prometheus'
    nacos_sd_configs:
      - server_address: 'nacos-server:8848'
        group_name: 'DEFAULT_GROUP'
        namespace_id: 'public'
        refresh_interval: 30s


@Configuration
public class CustomMetrics {
    @Bean
    MeterRegistryCustomizer<MeterRegistry> configurer(
        @Value("${spring.application.name}") String appName) {
        return registry -> registry.config().commonTags("application", appName);
    }
    
    @Bean
    Counter orderCounter(MeterRegistry registry) {
        return Counter.builder("api.orders.count")
            .description("Total order requests")
            .register(registry);
    }
}


<dependencies>
    <!-- SpringCloud Alibaba -->
    <dependency>
        <groupId>com.alibaba.cloud</groupId>
        <artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
    </dependency>
    
    <!-- 监控核心 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>


网站公告

今日签到

点亮在社区的每一天
去签到