目录
2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim
4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service
5.1 创建PersistentVolumeClaim,Deployment,Service
1. 概述
Prometheus是一个开源的监控和告警系统,特别适合云原生环境。本文将详细介绍如何在Kubernetes集群中部署一个完整的Prometheus监控系统,包括Prometheus Server、Node Exporter、Kube-state-metrics和Grafana等组件。
1.1 系统架构
Prometheus监控系统包含以下组件:
Prometheus Server: 核心监控服务器,负责数据采集和存储
Node Exporter: 节点级指标收集器
Kube-state-metrics: Kubernetes集群状态指标收集器
Grafana: 数据可视化和仪表板
1.1.1 架构图
1.2 环境准备
IP | 主机名 | 备注 |
---|---|---|
192.168.48.11 | master1 | master节点,k8s1.32.7 |
192.168.48.12 | master2 | master节点,k8s1.32.7 |
192.168.48.13 | master3 | master节点,k8s1.32.7 |
192.168.48.14 | node01 | node节点,k8s1.32.7 |
192.168.48.15 | node02 | noder节点,k8s1.32.7 |
192.168.48.16 | node03 | node节点,k8s1.32.7 |
192.168.48.19 | database | harbor仓库,nfs服务器 |
本次使用k8s高可用集群,且部署均采用国内镜像,即使没有harbor仓库也能正常部署,如果镜像拉取超时,请在评论区留言,博主一定及时补。nfs服务器一定要有,如果其他存储方案如ceph,hostpath等自行更改yaml文件配置。
k8s搭建nfs共享存储参考往期博客:
k8s集群搭建参考往期博客:
openeuler24.03部署k8s1.32.7集群(一主两从)
k8s高可用集群搭建参考往期博客:
openeuler24.03部署k8s1.32.7高可用集群(三主三从)
2. 部署prometheus
2.1 创建Namespace
vim prometheus-namespace.yaml
apiVersion: v1 kind: Namespace metadata: name: monitor labels: name: monitor purpose: monitoring
2.2 创建ConfigMap资源
vim prometheus-configmap.yaml
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitor data: prometheus.yml: | global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: # 采集 Prometheus 自身 - job_name: 'prometheus' kubernetes_sd_configs: - role: endpoints namespaces: names: [monitor] relabel_configs: - source_labels: [__meta_kubernetes_service_name] regex: prometheus-svc action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: web action: keep # 采集 CoreDNS - job_name: 'coredns' kubernetes_sd_configs: - role: endpoints namespaces: names: [kube-system] relabel_configs: - source_labels: [__meta_kubernetes_service_name] regex: kube-dns action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: metrics action: keep # 采集 kube-apiserver - job_name: 'kube-apiserver' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: false bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: endpoints namespaces: names: [default, kube-system] relabel_configs: - source_labels: [__meta_kubernetes_service_name] regex: kubernetes action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: https action: keep # 采集 node-exporter - job_name: 'node-exporter' kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:9100' target_label: __address__ action: replace # 采集 cadvisor - job_name: 'cadvisor' kubernetes_sd_configs: - role: node scheme: https tls_config: insecure_skip_verify: true ca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt' bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token' relabel_configs: - target_label: __metrics_path__ replacement: /metrics/cadvisor
2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim
vim prometheus.yaml
#创建SA apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: monitor --- #创建clusterrole apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy - nodes/proxy verbs: - get - list - watch - apiGroups: - "extenstions" resources: - ingresses verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get --- #创建clusterrolebinding apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: monitor --- #创建svc apiVersion: v1 kind: Service metadata: name: prometheus-svc namespace: monitor labels: app: prometheus annotations: prometheus_io_scrape: "true" # 注解,有这个才可以被Prometheus发现 spec: selector: app: prometheus type: NodePort ports: - name: web nodePort: 32224 port: 9090 targetPort: http --- #创建ingress apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: prometheus-ingress namespace: monitor spec: ingressClassName: nginx rules: - host: www.myprometheus.com http: paths: - path: / pathType: Prefix backend: service: name: prometheus-svc port: number: 9090 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: prometheus-pvc # PVC 名称 namespace: monitor spec: accessModes: - ReadWriteOnce # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany) resources: requests: storage: 2Gi # 请求的存储容量 storageClassName: nfs-client # 指定 StorageClass(根据集群环境调整) --- #创建deployment apiVersion: apps/v1 kind: Deployment metadata: name: prometheus namespace: monitor labels: app: prometheus spec: selector: matchLabels: app: prometheus replicas: 1 template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus initContainers: - name: "change-permission-of-directory" image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/busybox:latest command: ["/bin/sh"] args: ["-c","chown -R 65534:65534 /prometheus"] securityContext: privileged: true volumeMounts: - mountPath: "/etc/prometheus" name: config-volume - mountPath: "/prometheus" name: data containers: - image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/prom/prometheus:latest name: prometheus args: - "--config.file=/etc/prometheus/prometheus.yml"#指定prometheus配置文件路径 - "--storage.tsdb.path=/prometheus"#指定tsdb数据库存储路径 - "--web.enable-lifecycle"#允许热更新,curl localhost:9090/-/reload 进行热更新 - "--web.console.libraries=/usr/share/prometheus/console_libraries" - "--web.console.templates=/usr/share/prometheus/consoles" ports: - containerPort: 9090 name: http volumeMounts: - mountPath: "/etc/prometheus" name: config-volume - mountPath: "/prometheus" name: data resources: requests: cpu: 100m memory: 512Mi limits: cpu: 100m memory: 512Mi volumes: - name: data persistentVolumeClaim: claimName: prometheus-pvc - configMap: name: prometheus-config name: config-volume
3. 部署Node_exporter组件
3.1 创建Daemonsets资源
vim node-exportet-daemonset.yaml
apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: monitor labels: app: node-exporter spec: selector: matchLabels: app: node-exporter template: metadata: labels: app: node-exporter spec: hostPID: true hostIPC: true hostNetwork: true nodeSelector: kubernetes.io/os: linux containers: - name: node-exporter image: docker.io/prom/node-exporter:latest args: - --web.listen-address=$(HOSTIP):9100 - --path.procfs=/host/proc - --path.sysfs=/host/sys - --path.rootfs=/host/root - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/) - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ ports: - containerPort: 9100 env: - name: HOSTIP valueFrom: fieldRef: fieldPath: status.hostIP resources: requests: cpu: 150m memory: 180Mi limits: cpu: 150m memory: 180Mi securityContext: runAsNonRoot: true runAsUser: 65534 volumeMounts: - name: proc mountPath: /host/proc - name: sys mountPath: /host/sys - name: root mountPath: /host/root mountPropagation: HostToContainer readOnly: true tolerations: - operator: "Exists" volumes: - name: proc hostPath: path: /proc - name: dev hostPath: path: /dev - name: sys hostPath: path: /sys - name: root hostPath: path: /
创建Service
vim node-exportet-svc.yaml
apiVersion: v1 kind: Service metadata: name: node-exporter namespace: monitor labels: app: node-exporter spec: selector: app: node-exporter ports: - name: metrics port: 9100 targetPort: 9100 clusterIP: None # Headless Service(直接通过 Pod IP 访问)
4. 部署Kube_state_metrics组件
4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service
kube-state-metrics.yaml
--- apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics namespace: monitor --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics rules: - apiGroups: [""] resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"] verbs: ["list", "watch"] - apiGroups: ["extensions"] resources: ["daemonsets", "deployments", "replicasets"] verbs: ["list", "watch"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["list", "watch"] - apiGroups: ["batch"] resources: ["cronjobs", "jobs"] verbs: ["list", "watch"] - apiGroups: ["autoscaling"] resources: ["horizontalpodautoscalers"] verbs: ["list", "watch"] - apiGroups: ["networking.k8s.io"] resources: ["ingresses"] verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects: - kind: ServiceAccount name: kube-state-metrics namespace: monitor --- apiVersion: apps/v1 kind: Deployment metadata: name: kube-state-metrics namespace: monitor spec: replicas: 1 selector: matchLabels: app: kube-state-metrics template: metadata: labels: app: kube-state-metrics spec: serviceAccountName: kube-state-metrics containers: - name: kube-state-metrics image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2 imagePullPolicy: IfNotPresent ports: - containerPort: 8080 --- apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' name: kube-state-metrics namespace: monitor labels: app: kube-state-metrics spec: ports: - name: kube-state-metrics port: 8080 protocol: TCP selector: app: kube-state-metrics
5. 部署Grafana可视化平台
5.1 创建PersistentVolumeClaim,Deployment,Service
vim grafana.yaml
--- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana-pvc # PVC 名称 namespace: monitor spec: accessModes: - ReadWriteOnce # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany) resources: requests: storage: 2Gi # 请求的存储容量 storageClassName: nfs-client # 指定 StorageClass(根据集群环境调整) --- apiVersion: apps/v1 kind: Deployment metadata: name: grafana-server namespace: monitor spec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: grafana template: metadata: labels: task: monitoring k8s-app: grafana spec: containers: - name: grafana image: grafana/grafana:latest imagePullPolicy: IfNotPresent ports: - containerPort: 3000 protocol: TCP volumeMounts: - mountPath: /var/lib/grafana/ name: grafana-data env: - name: INFLUXDB_HOST value: monitoring-influxdb - name: GF_SERVER_HTTP_PORT value: "3000" - name: GF_AUTH_BASIC_ENABLED value: "false" - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin - name: GF_SERVER_ROOT_URL value: / volumes: - name: grafana-data persistentVolumeClaim: claimName: grafana-pvc affinity: # 调度优化(可选) nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: node-role.kubernetes.io/monitoring operator: Exists --- apiVersion: v1 kind: Service metadata: labels: kubernetes.io/cluster-service: 'true' kubernetes.io/name: monitoring-grafana name: grafana-svc namespace: monitor spec: ports: - port: 80 targetPort: 3000 nodePort: 31091 selector: k8s-app: grafana type: NodePort
6. 部署命令
按照以下顺序部署各个组件:
# 1. 创建命名空间 kubectl apply -f prometheus-namespace.yaml # 2. 部署Prometheus配置 kubectl apply -f prometheus-configmap.yaml # 3. 部署Prometheus主服务 kubectl apply -f prometheus.yaml # 4. 部署Kube-state-metrics kubectl apply -f kube-state-metrics.yaml # 5. 部署Node Exporter kubectl apply -f node-exportet-daemonset.yaml kubectl apply -f node-exportet-svc.yaml # 6. 部署Grafana kubectl apply -f grafana.yaml
检查pod状态:
[root@master1 prometheus]# kubectl get pod -n monitor NAME READY STATUS RESTARTS AGE grafana-server-64c9777c7b-drgdd 1/1 Running 0 110m kube-state-metrics-6db447664-6r2wp 1/1 Running 0 110m node-exporter-ccwk8 1/1 Running 0 110m node-exporter-fbq22 1/1 Running 0 110m node-exporter-hbtm6 1/1 Running 0 110m node-exporter-ndbhh 1/1 Running 0 110m node-exporter-sbb4p 1/1 Running 0 110m node-exporter-xd467 1/1 Running 0 110m prometheus-7cd9944dc4-lbjwx 1/1 Running 0 110m
7. 访问服务
部署完成后,可以通过以下方式访问服务:
Prometheus:
http://<node-ip>:32224
或http://www.myprometheus.com
(需要配置域名解析)Grafana:
http://<node-ip>:31091
前排提示:192.168.48.10是我的k8s集群高可用的vip,如果不是高可用,输入Pod所在的主机IP即可。
访问Prometheus:http://192.168.48.10:32224
访问grafana:http://192.168.48.10:31091/
8. grafana仪表盘展示
8.1 为grafana配置数据源
点击最下方save & test,出现Successfully queried the Prometheus API.则为成功。
8.2 导入仪表盘
仪表盘id:
node节点监控:16098
k8s集群监控:14249