在最近一次 K8s 环境的维护中,发现多个 Pod 使用相同镜像时,调度到固定节点的问题导致集群节点资源分配不均的情况。 启用调度器的打分日志后发现这一现象是由 ImageLocality 打分策略所引起的(所有的节点中,只有一个节点有运行该 pod 的镜像,所以这个节点调度器打分最高); 最终,通过禁用该插件成功解决调度不均匀的问题。在此,我想分享这一经验,希望能够对大家有所帮助!
1、自定义配置文件
# vi /etc/kubernetes/config.yaml
# 此处禁用 ImageLocality 打分插件(设置权重为 0)
...
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
- schedulerName: default-scheduler
plugins:
multiPoint:
enabled:
- name: ImageLocality
weight: 0
...
- (可跳过)禁用打分插件的第二种方式,根据实际情况进行配置
# vi /etc/kubernetes/config.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
- schedulerName: default-scheduler
plugins:
score:
disabled:
- name: ImageLocality
2、配置 kube-scheduler
# 修改 kube-scheduler.yaml
# 在 spec.containers[0].command 添加参数;并挂载配置文件
vi /etc/kubernetes/manifests/kube-scheduler.yaml
...
- --config=/etc/kubernetes/config.yaml
...
- mountPath: /etc/kubernetes/config.yaml # 添加挂载
name: config
readOnly: true
...
- hostPath:
path: /etc/kubernetes/config.yaml
type: FileOrCreate
name: config
...
# 等待 kube-scheduler 自动重启
kubectl get pod -n kube-system
# 完整 kube-scheduler.yaml 如下
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
- --config=/etc/kubernetes/config.yaml
- --v=10
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.31.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/config.yaml
name: config
readOnly: true
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/config.yaml
type: FileOrCreate
name: config
status: {}
3、验证自定义配置生效
# kube-scheduler 日志如何开启可见博客
# https://blog.csdn.net/mm1234556/article/details/148686859
# 通过日志查看配置文件
kubectl logs -n kube-system kube-scheduler-master |grep -A50 apiVersion:
# 手动创建一个 pod,查看 kube-scheduler 日志中调度的打分细节
kubectl logs kube-scheduler-master -n kube-system |grep -A10 score
# 详细日志,最终调度到 node04 节点上
I0623 01:28:10.258309 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node02: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3200 memory:5674337280] ,score 94,
I0623 01:28:10.258309 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node01: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3820 memory:6427770880] ,score 94,
I0623 01:28:10.258312 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node04: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:8000 memory:32122888192], map of requested resources map[cpu:1050 memory:1468006400] ,score 91,
I0623 01:28:10.258334 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node02: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3200 memory:5674337280] ,score 85,
I0623 01:28:10.258339 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node04: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:8000 memory:32122888192], map of requested resources map[cpu:1050 memory:1468006400] ,score 90,
I0623 01:28:10.258338 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node01: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3820 memory:6427770880] ,score 83,
I0623 01:28:10.258375 1 generic_scheduler.go:504] Plugin NodePreferAvoidPods scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 1000000} {node04 1000000} {node02 1000000}]
I0623 01:28:10.258384 1 generic_scheduler.go:504] Plugin PodTopologySpread scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258389 1 generic_scheduler.go:504] Plugin TaintToleration scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 100} {node04 100} {node02 100}]
I0623 01:28:10.258393 1 generic_scheduler.go:504] Plugin NodeResourcesBalancedAllocation scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 94} {node04 91} {node02 94}]
I0623 01:28:10.258396 1 generic_scheduler.go:504] Plugin InterPodAffinity scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258400 1 generic_scheduler.go:504] Plugin NodeResourcesLeastAllocated scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 83} {node04 90} {node02 85}]
I0623 01:28:10.258404 1 generic_scheduler.go:504] Plugin NodeAffinity scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258409 1 generic_scheduler.go:560] Host node01 => Score 1000277
I0623 01:28:10.258412 1 generic_scheduler.go:560] Host node04 => Score 1000281
I0623 01:28:10.258414 1 generic_scheduler.go:560] Host node02 => Score 1000279