在K8S集群中部署EFK日志收集

发布于:2025-05-15 ⋅ 阅读:(17) ⋅ 点赞:(0)

引言

  • 系统版本为 Centos7.9
  • 内核版本为 6.3.5-1.el7
  • K8S版本为 v1.26.14
  • ES官网
  • 本次部署已经尽量避免踩坑,直接使用官方的方法有点问题。

环境准备

  • 准备ceph存储或者nfs存储
  • NFS存储安装方法
  • 本次安装使用官方ECK方式部署 EFK(老版本,7.17.3。 现存的生产环境版本基本都是这个版本。)
  • 增加RBAC权限和日志模板相关内容方便在输出日志的时候添加K8S相关内容

安装自定义资源

kubectl create -f https://download.elastic.co/downloads/eck/1.7.1/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/1.7.1/operator.yaml

部署Elasticsearch

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
  namespace: elastic-system
spec:
  version: 7.17.3
  nodeSets:
  - name: masters
    count: 1
    config:
      node.roles: ["master"]
      xpack.ml.enabled: true
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        storageClassName: nfs-dynamic
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  - name: data
    count: 1
    config:
      node.roles: ["data", "ingest", "ml", "transform"]
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        storageClassName: nfs-dynamic
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi

生产环境建议按照下面的方式配置,我这个是测试环境怎么省事怎么来。

Master 节点与 Data 节点的区别

特性 Master 节点 Data 节点
核心职责 管理集群元数据(如索引、分片分配、节点状态) 存储数据(主分片和副本分片),执行读写操作(搜索、聚合)
配置中的角色定义 node.roles: ["master"] node.roles: ["data", "ingest", "ml", "transform"]
资源需求 低 CPU/内存(轻量级元数据管理) 高 CPU/内存/磁盘(处理数据和计算)
高可用性要求 必须冗余(生产环境至少 3 个,避免脑裂) 可水平扩展(根据数据量和负载动态增减)
示例场景 集群协调、分片分配、状态维护 文档写入、搜索请求处理、机器学习任务

生产优化建议

Master 节点配置优化

nodeSets:
- name: masters
  count: 3  # 生产环境至少部署 3 个
  config:
    node.roles: ["master"]
  # 禁用非必要功能(节省资源)
    xpack.ml.enabled: false

Data 节点角色分离

- name: data-only
  count: 2
  config:
    node.roles: ["data"]  # 专注数据存储

- name: ingest
  count: 2
  config:
    node.roles: ["ingest"]  # 专用写入节点

- name: ml
  count: 1
  config:
    node.roles: ["ml", "transform"]  # 独立计算节点

安装好以后测试ES是否正常

## 打开两个终端测试或者后台运行一个命令。
kubectl port-forward -n elastic-system services/quickstart-es-http 9200

## 获取密码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')

## 访问一下测试
curl -u "elastic:$PASSWORD" -k "https://localhost:9200"

部署Fluentd

  • 提供Fluentd的DaemonSet配置文件示例
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
  namespace: elastic-system
spec:
  type: filebeat
  version: 7.17.3
  elasticsearchRef:
    name: quickstart             # 关联的 Elasticsearch 资源对象名
    namespace: elastic-system    # Elasticsearch 所在 Namespace
  config:
    filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
    
    processors:
      - add_kubernetes_metadata: # 增加 k8s label 等相关信息。
          host: ${NODE_NAME} 
          matchers:
          - logs_path:
              logs_path: "/var/log/containers/"
      - drop_fields: # 这里可以根据需求增减需要去除的值
          fields: ["agent", "ecs", "container", "host","host.name","input", "log", "offset", "stream","kubernetes.namespace","kubernetes.labels.app","kubernetes.node", "kubernetes.pod", "kubernetes.replicaset", "kubernetes.namespace_uid", "kubernetes.labels.pod-template-hash"]
          ignore_missing: true # 字段不存在时不报错
      - decode_json_fields: 
          fields: ["message"]  # 要解析的原始字段
          target: ""           # 解析到根层级(平铺字段)
          overwrite_keys: false # 是否覆盖原有值
          process_array: false # 是否解析数组格式
          max_depth: 1         # 仅解析一层 JSON

    output.elasticsearch:
      username: "elastic"  # 使用 Elastic 内置超级用户(生产环境不推荐)
      password: "5ypyQpuC6BB191Si9w1209MM"    # 这里需要改成正确的密码(生产环境建议使用Secret注入)
      index: "filebeat-other-log-%{+yyyy.MM.dd}"
      indices: # # 索引路由规则(按条件分流)
        - index: "filebeat-containers-log-%{+yyyy.MM.dd}"  # 默认索引格式(按日滚动)
          when.or:
            - contains:
                kubernetes.labels.app: "etcd"
        - index: "filebeat-services-log-%{+yyyy.MM.dd}"
          when.contains:
            kubernetes.labels.type: "service"
      pipelines: # 引用 Ingest Pipeline 处理数据流
        - pipeline: "filebeat-containers-log-pipeline"
          when.or:
            - contains:
                kubernetes.labels.app: "etcd"
        - pipeline: "filebeat-services-log-pipeline"
          when.contains:
            kubernetes.labels.type: "service"
    setup.template.settings:
      index:
        number_of_shards: 1    # 主分片数设为 1
        number_of_replicas: 0  # 副本数设为 0  ## 生产环境至少为1.
    
    setup.template.enabled: true      # 必须开启模板功能
    setup.template.overwrite: true    # 强制覆盖旧模板
    setup.template.name: "filebeat-log-template"  # ✅ 自定义模板名
    setup.template.pattern: "filebeat-*-log-*"    # ✅ 匹配所有日
    setup.ilm.enabled: false           # 禁用 ILM(与手动模板配置兼容)
  daemonSet:
    podTemplate:
      spec:
        serviceAccount: elastic-beat-filebeat-quickstart
        automountServiceAccountToken: true
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
        containers:
        - name: filebeat
          env: 
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/containers
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/containers
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-beat-filebeat-quickstart
  namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-beat-autodiscover-binding
  namespace: elastic-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elastic-beat-autodiscover
subjects:
- kind: ServiceAccount
  name: elastic-beat-filebeat-quickstart
  namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: elastic-beat-autodiscover
  namespace: elastic-system
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - namespaces
  - events
  - pods
  verbs:
  - get
  - list
  - watch

测试filebeat是否正常推送日志

PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/filebeat-*/_search"

部署Kibana

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kibana-data-pvc
  namespace: elastic-system
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: nfs-dynamic
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: quickstart
  namespace: elastic-system
spec:
  version: 7.17.3
  count: 1
  elasticsearchRef:
    name: quickstart
    namespace: elastic-system
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  config:
    i18n.locale: "zh-CN" # 添加中文支持
  podTemplate:
    spec:
      containers:
      - name: kibana
        env:
          - name: NODE_OPTIONS
            value: "--max-old-space-size=2048"
        volumeMounts:
          - mountPath: /usr/share/kibana/data
            name: kibana-data 
      volumes:
        - name: kibana-data
          persistentVolumeClaim:
            claimName: kibana-data-pvc 
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana-ingress
  namespace: elastic-system
spec:
  ingressClassName: nginx
  rules:
  - host: kibana.deployers.cn
    http:
      paths:
      - backend:
          service:
            name: quickstart-kb-http
            port:
              name: http
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - kibana.deployers.cn

获取账号密码,账号是:elastic

## 获取密码
kubectl get secret -n elastic-system quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode; echo

集群测试

查询集群健康状态

## 新建一个窗口,执行这条命令。
kubectl port-forward -n elastic-system services/quickstart-es-http 9200

## 获取密码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
## 查看状态
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cluster/health?pretty"

正常输出状态

{
  "cluster_name" : "quickstart",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 14,
  "active_shards" : 14,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

查看未分配分片详细信息

  • prirep:r 表示副本分片
  • unassigned.reason:未分配原因(如 NODE_LEFT、INDEX_CREATED 等)
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason"

输出结果

index                                                         shard prirep state   unassigned.reason
.async-search                                                 0     p      STARTED 
.apm-agent-configuration                                      0     p      STARTED 
.apm-custom-link                                              0     p      STARTED 
.kibana-event-log-7.17.3-000001                               0     p      STARTED 
.geoip_databases                                              0     p      STARTED 
.kibana_security_session_1                                    0     p      STARTED 
.ds-ilm-history-5-2025.05.09-000001                           0     p      STARTED 
.kibana_task_manager_7.17.3_001                               0     p      STARTED 
.security-7                                                   0     p      STARTED 
.ds-.logs-deprecation.elasticsearch-default-2025.05.09-000001 0     p      STARTED 
product-other-log-2025.05.12                                  0     p      STARTED 
.tasks                                                        0     p      STARTED 
.kibana_7.17.3_001                                            0     p      STARTED 
product-other-log-2025.05.09                                  0     p      STARTED

检查节点资源使用情况

curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/nodes?v&h=name,disk.used_percent,ram.percent,cpu"

输出结果

name                    disk.used_percent ram.percent cpu
quickstart-es-masters-0              1.73          54   16
quickstart-es-data-0                 1.73          55   16
  • 阈值参考‌:
    – 磁盘使用率 ≤85%
    – 内存使用率 ≤80%

查看es节点状态

curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/nodes?v"
 
ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.20.129.129            8          55   16    0.96    0.76     0.57 m         *      quickstart-es-masters-0
172.20.129.130           70          56   16    0.96    0.76     0.57 dilt      -      quickstart-es-data-0

网站公告

今日签到

点亮在社区的每一天
去签到