创建ipv6 only和ipv6+ip4的k8s集群的注意事项

发布于:2025-06-07 ⋅ 阅读:(12) ⋅ 点赞:(0)

关键字 : CNI calico vxlan flannel ipv6-only ipv6+ipv4

在搭建ipv6-only或ipv6+ipv4的k8s集群时,在worker节点加入集群后,发现worker节点上的CNI启动失败。

以下是calico的启动失败情况 :

kubectl get pod -A

输出如下 :

NAMESPACE     NAME                                      READY   STATUS                  RESTARTS      AGE
kube-system   calico-kube-controllers-79949b87d-ptq2r   1/1     Running                 0             19m
kube-system   calico-node-jbrn7                         0/1     Init:CrashLoopBackOff   7 (40s ago)   14m
kube-system   calico-node-xnwdx                         1/1     Running                 0             19m
kube-system   coredns-6766b7b6bb-wc5j5                  1/1     Running                 0             20m
kube-system   coredns-6766b7b6bb-wvg5w                  1/1     Running                 0             20m
kube-system   etcd-myserver1                            1/1     Running                 0             20m
kube-system   kube-apiserver-myserver1                  1/1     Running                 0             20m
kube-system   kube-controller-manager-myserver1         1/1     Running                 0             20m
kube-system   kube-proxy-g8gxb                          1/1     Running                 0             20m
kube-system   kube-proxy-lnddv                          1/1     Running                 0             14m
kube-system   kube-scheduler-myserver1                  1/1     Running                 0             20m

查看POD calico-node-jbrn7的详细情况,输出类似如下 :

kubectl describe pod -n kube-system calico-node-jbrn7

输出如下 :

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  20m                  default-scheduler  Successfully assigned kube-system/calico-node-jbrn7 to worker1
  Normal   Pulled     20m                  kubelet            Container image "docker.io/calico/cni:v3.29.3" already present on machine
  Normal   Created    20m                  kubelet            Created container: upgrade-ipam
  Normal   Started    20m                  kubelet            Started container upgrade-ipam
  Normal   Created    6m40s (x8 over 20m)  kubelet            Created container: install-cni
  Normal   Started    6m40s (x8 over 20m)  kubelet            Started container install-cni
  Normal   Pulled     68s (x9 over 20m)    kubelet            Container image "docker.io/calico/cni:v3.29.3" already present on machine
  Warning  BackOff    11s (x76 over 20m)   kubelet            Back-off restarting failed container install-cni in pod calico-node-jbrn7_kube-system(a888f1ad-ec45-4207-94ac-f2953bda9d0e)

事件Events中可以看到是执行POD中的名为install-cni的容器时发生了异常.

再查看容器install-cni的日志可以看到如下内容 :

2025-05-29 09:53:04.523 [INFO][1] cni-installer/install.go 233: CNI plugin version: v3.29.3
2025-05-29 09:53:04.523 [INFO][1] cni-installer/install.go 185: /host/secondary-bin-dir is not writeable, skipping
2025-05-29 09:53:04.523 [WARNING][1] cni-installer/winutils.go 150: Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2025-05-29 09:53:34.524 [ERROR][1] cni-installer/token_watch.go 108: Unable to create token for CNI kubeconfig error=Post "https://[fd15:4ba5:5a2b:1008:2000::1]:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp [fd15:4ba5:5a2b:1008:2000::1]:443: i/o timeout
2025-05-29 09:53:34.524 [FATAL][1] cni-installer/install.go 478: Unable to create token for CNI kubeconfig error=Post "https://[fd15:4ba5:5a2b:1008:2000::1]:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp [fd15:4ba5:5a2b:1008:2000::1]:443: i/o timeout

即:无法连接到API Server的clusterIP [fd15:4ba5:5a2b:1008:2000::1]:443
这种情况在IPV4时不会出现

原理不多说,直接给出解决办法:

  • 修改calico的YAML文件,让calico-node连接API SERVER的物理IPV6地址,即执行ip a所看到的IPV6地址

calico.yaml文件中新增名为kubernetes-services-endpointConfigMap对象,如下所示:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kubernetes-services-endpoint
  namespace: kube-system
data:
  # 指定 API Server 的节点 IP
  KUBERNETES_SERVICE_HOST: "fd15:4ba5:5a2b:1008:192:168:186:40"
  KUBERNETES_SERVICE_PORT: "6443"

注意 :

  • 1.必须新创建ConfigMap对象,不能在原有的名为calico-configConfigMap对象上修改;
  • 2.新创建ConfigMap对象的名字必须是kubernetes-services-endpoint
  • 3.在IPV6单栈和IPV6为主的双栈情况下还需要在calico.yaml中的DaemonSet设置其它相关环境变量,这里就不赘述了

下面是flannel的修改方法
修改kube-flannel.yml中的DaemonSet下的env部分,新增环境变量KUBERNETES_SERVICE_HOSTKUBERNETES_SERVICE_PORT.如下所示 :

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: flannel
    k8s-app: flannel
    tier: node
  name: kube-flannel-ds
  namespace: kube-flannel
spec:
  selector:
    matchLabels:
      app: flannel
      k8s-app: flannel
  template:
    metadata:
      labels:
        app: flannel
        k8s-app: flannel
        tier: node
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      containers:
      - args:
        - --ip-masq
        - --kube-subnet-mgr
        command:
        - /opt/bin/flanneld
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        - name: FLANNELD_IFACE
          value: "ens33"
        # 指定API Server的节点IP地址
        - name: KUBERNETES_SERVICE_HOST
          value: "fd15:4ba5:5a2b:1008:192:168:186:40"
        - name: KUBERNETES_SERVICE_PORT
          value: "6443"
        image: ghcr.io/flannel-io/flannel:v0.26.7
        name: kube-flannel