目录
引言
- 系统版本为 Centos7.9
- 内核版本为 6.3.5-1.el7
- K8S版本为 v1.26.14
- ES官网
- 本次部署已经尽量避免踩坑,直接使用官方的方法有点问题。
环境准备
- 准备ceph存储或者nfs存储
- NFS存储安装方法
- 本次安装使用官方ECK方式部署 EFK(老版本,7.17.3。 现存的生产环境版本基本都是这个版本。)
- 增加RBAC权限和日志模板相关内容方便在输出日志的时候添加K8S相关内容
安装自定义资源
kubectl create -f https://download.elastic.co/downloads/eck/1.7.1/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/1.7.1/operator.yaml
部署Elasticsearch
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
namespace: elastic-system
spec:
version: 7.17.3
nodeSets:
- name: masters
count: 1
config:
node.roles: ["master"]
xpack.ml.enabled: true
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
storageClassName: nfs-dynamic
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
- name: data
count: 1
config:
node.roles: ["data", "ingest", "ml", "transform"]
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
storageClassName: nfs-dynamic
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
生产环境建议按照下面的方式配置,我这个是测试环境怎么省事怎么来。
Master 节点与 Data 节点的区别
特性 | Master 节点 | Data 节点 |
---|---|---|
核心职责 | 管理集群元数据(如索引、分片分配、节点状态) | 存储数据(主分片和副本分片),执行读写操作(搜索、聚合) |
配置中的角色定义 | node.roles: ["master"] |
node.roles: ["data", "ingest", "ml", "transform"] |
资源需求 | 低 CPU/内存(轻量级元数据管理) | 高 CPU/内存/磁盘(处理数据和计算) |
高可用性要求 | 必须冗余(生产环境至少 3 个,避免脑裂) | 可水平扩展(根据数据量和负载动态增减) |
示例场景 | 集群协调、分片分配、状态维护 | 文档写入、搜索请求处理、机器学习任务 |
生产优化建议
Master 节点配置优化
nodeSets:
- name: masters
count: 3 # 生产环境至少部署 3 个
config:
node.roles: ["master"]
# 禁用非必要功能(节省资源)
xpack.ml.enabled: false
Data 节点角色分离
- name: data-only
count: 2
config:
node.roles: ["data"] # 专注数据存储
- name: ingest
count: 2
config:
node.roles: ["ingest"] # 专用写入节点
- name: ml
count: 1
config:
node.roles: ["ml", "transform"] # 独立计算节点
安装好以后测试ES是否正常
## 打开两个终端测试或者后台运行一个命令。
kubectl port-forward -n elastic-system services/quickstart-es-http 9200
## 获取密码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
## 访问一下测试
curl -u "elastic:$PASSWORD" -k "https://localhost:9200"
部署Fluentd
- 提供Fluentd的DaemonSet配置文件示例
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: filebeat
namespace: elastic-system
spec:
type: filebeat
version: 7.17.3
elasticsearchRef:
name: quickstart # 关联的 Elasticsearch 资源对象名
namespace: elastic-system # Elasticsearch 所在 Namespace
config:
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata: # 增加 k8s label 等相关信息。
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_fields: # 这里可以根据需求增减需要去除的值
fields: ["agent", "ecs", "container", "host","host.name","input", "log", "offset", "stream","kubernetes.namespace","kubernetes.labels.app","kubernetes.node", "kubernetes.pod", "kubernetes.replicaset", "kubernetes.namespace_uid", "kubernetes.labels.pod-template-hash"]
ignore_missing: true # 字段不存在时不报错
- decode_json_fields:
fields: ["message"] # 要解析的原始字段
target: "" # 解析到根层级(平铺字段)
overwrite_keys: false # 是否覆盖原有值
process_array: false # 是否解析数组格式
max_depth: 1 # 仅解析一层 JSON
output.elasticsearch:
username: "elastic" # 使用 Elastic 内置超级用户(生产环境不推荐)
password: "5ypyQpuC6BB191Si9w1209MM" # 这里需要改成正确的密码(生产环境建议使用Secret注入)
index: "filebeat-other-log-%{+yyyy.MM.dd}"
indices: # # 索引路由规则(按条件分流)
- index: "filebeat-containers-log-%{+yyyy.MM.dd}" # 默认索引格式(按日滚动)
when.or:
- contains:
kubernetes.labels.app: "etcd"
- index: "filebeat-services-log-%{+yyyy.MM.dd}"
when.contains:
kubernetes.labels.type: "service"
pipelines: # 引用 Ingest Pipeline 处理数据流
- pipeline: "filebeat-containers-log-pipeline"
when.or:
- contains:
kubernetes.labels.app: "etcd"
- pipeline: "filebeat-services-log-pipeline"
when.contains:
kubernetes.labels.type: "service"
setup.template.settings:
index:
number_of_shards: 1 # 主分片数设为 1
number_of_replicas: 0 # 副本数设为 0 ## 生产环境至少为1.
setup.template.enabled: true # 必须开启模板功能
setup.template.overwrite: true # 强制覆盖旧模板
setup.template.name: "filebeat-log-template" # ✅ 自定义模板名
setup.template.pattern: "filebeat-*-log-*" # ✅ 匹配所有日
setup.ilm.enabled: false # 禁用 ILM(与手动模板配置兼容)
daemonSet:
podTemplate:
spec:
serviceAccount: elastic-beat-filebeat-quickstart
automountServiceAccountToken: true
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
containers:
- name: filebeat
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/containers
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/containers
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-beat-filebeat-quickstart
namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: elastic-beat-autodiscover-binding
namespace: elastic-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: elastic-beat-autodiscover
subjects:
- kind: ServiceAccount
name: elastic-beat-filebeat-quickstart
namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: elastic-beat-autodiscover
namespace: elastic-system
rules:
- apiGroups:
- ""
resources:
- nodes
- namespaces
- events
- pods
verbs:
- get
- list
- watch
测试filebeat是否正常推送日志
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/filebeat-*/_search"
部署Kibana
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kibana-data-pvc
namespace: elastic-system
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: nfs-dynamic
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: quickstart
namespace: elastic-system
spec:
version: 7.17.3
count: 1
elasticsearchRef:
name: quickstart
namespace: elastic-system
http:
tls:
selfSignedCertificate:
disabled: true
config:
i18n.locale: "zh-CN" # 添加中文支持
podTemplate:
spec:
containers:
- name: kibana
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=2048"
volumeMounts:
- mountPath: /usr/share/kibana/data
name: kibana-data
volumes:
- name: kibana-data
persistentVolumeClaim:
claimName: kibana-data-pvc
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana-ingress
namespace: elastic-system
spec:
ingressClassName: nginx
rules:
- host: kibana.deployers.cn
http:
paths:
- backend:
service:
name: quickstart-kb-http
port:
name: http
path: /
pathType: Prefix
tls:
- hosts:
- kibana.deployers.cn
获取账号密码,账号是:elastic
## 获取密码
kubectl get secret -n elastic-system quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode; echo
集群测试
查询集群健康状态
## 新建一个窗口,执行这条命令。
kubectl port-forward -n elastic-system services/quickstart-es-http 9200
## 获取密码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
## 查看状态
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cluster/health?pretty"
正常输出状态
{
"cluster_name" : "quickstart",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 14,
"active_shards" : 14,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
查看未分配分片详细信息
- prirep:r 表示副本分片
- unassigned.reason:未分配原因(如 NODE_LEFT、INDEX_CREATED 等)
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason"
输出结果
index shard prirep state unassigned.reason
.async-search 0 p STARTED
.apm-agent-configuration 0 p STARTED
.apm-custom-link 0 p STARTED
.kibana-event-log-7.17.3-000001 0 p STARTED
.geoip_databases 0 p STARTED
.kibana_security_session_1 0 p STARTED
.ds-ilm-history-5-2025.05.09-000001 0 p STARTED
.kibana_task_manager_7.17.3_001 0 p STARTED
.security-7 0 p STARTED
.ds-.logs-deprecation.elasticsearch-default-2025.05.09-000001 0 p STARTED
product-other-log-2025.05.12 0 p STARTED
.tasks 0 p STARTED
.kibana_7.17.3_001 0 p STARTED
product-other-log-2025.05.09 0 p STARTED
检查节点资源使用情况
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/nodes?v&h=name,disk.used_percent,ram.percent,cpu"
输出结果
name disk.used_percent ram.percent cpu
quickstart-es-masters-0 1.73 54 16
quickstart-es-data-0 1.73 55 16
- 阈值参考:
– 磁盘使用率 ≤85%
– 内存使用率 ≤80%
查看es节点状态
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/nodes?v"
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.20.129.129 8 55 16 0.96 0.76 0.57 m * quickstart-es-masters-0
172.20.129.130 70 56 16 0.96 0.76 0.57 dilt - quickstart-es-data-0