一、为什么选择k3s
k3s是k8s官方发行的轻量版本,只有一个二进制文件,官方提供部署脚本,相较于K8S,k3s只需要设置好参数,部署起来非常方便。虽然k3s是轻量版本,但是使用起来几乎和K8S没有区别,非常适合作为小型项目的基座。k3s集成度高,内置了etcd,但是对于生产环境而言,使用内置的etcd要考虑性能和数据安全,所以此处放弃使用k3s内置etcd,选择部署单独的etcd集群。考虑到网络性能的问题,此处使用calico来替换内置Flannel网络插件。
二、集群规划
操作系统:Rocky Linux release 9.6 (Blue Onyx)
 架构:x86
| 机器IP | 主机名 | 角色 | 
|---|---|---|
| 192.168.18.11 | k3s1 | etcd, k3s master节点 | 
| 192.168.18.12 | k3s2 | etcd, k3s master节点 | 
| 192.168.18.13 | k3s3 | etcd, k3s master节点 | 
| 192.168.18.14 | k3s4 | k3s node节点 | 
| 192.168.18.15 | k3s5 | k3s node 节点 | 
在进行安装前关闭所有机器的防火墙和selinux!
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
在每台机器的/etc/hosts中配置好主机映射
 cat /etc/hosts
192.168.18.11 k3s1
192.168.18.12 k3s2
192.168.18.13 k3s3
192.168.18.14 k3s4
192.168.18.15 k3s5
三、安装etcd集群
3.1 从github下载etcd二进制包
etcd :https://github.com/etcd-io/etcd/releases
 此处我使用的版本是3.5.21
 
3.2 生成证书
在k3s1上生成证书,生成后传到其他节点即可
3.2.1 安装cfssl
需要下载两个二进制文件
 cfssl:https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
 cfssljson:https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
 文件下载完成后,上传到k3s1的/usrl/local/bin下
cd /usrl/local/bin
mv cfssl_linux-amd64 cfssl
mv cfssljson_linux-amd64 cfssljson
chmod +x cfssl*
3.2.2 创建证书目录
mkdir -p /etc/etcd/ssl
cd /etc/etcd/ssl
3.2.3 创建ca配置文件
cat ca-config.json
{
  "signing": {
    "default": {
      "expiry": "876000h"
    },
    "profiles": {
      "etcd": {
        "expiry": "876000h",
        "usages": ["signing", "key encipherment", "server auth", "client auth"]
      }
    }
  }
}
876000h=100年,生成的证书有效期为100年
3.2.4 生成 CA 证书签发配置
cat ca-csr.json
{
  "CN": "etcd-ca",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "Beijing",
      "L": "Beijing",
      "O": "etcd",
      "OU": "Etcd Security"
    }
  ]
}
生成ca
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
当前目录下会生成ca.pem、ca-key.pem文件
3.2.5 生成 etcd 节点证书
cat etcd-csr.json
{
  "CN": "etcd",
  "hosts": [
    "127.0.0.1",
    "192.168.18.11",
    "192.168.18.12",
    "192.168.18.13"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "Beijing",
      "L": "Beijing",
      "O": "etcd",
      "OU": "Etcd Security"
    }
  ]
}
生成证书:
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd-csr.json | cfssljson -bare server
得到文件server.pem,server-key.pem
 把 /etc/etcd/ssl 整个目录拷贝到剩余2台k3s master节点,即k3s2 、k3s2
3.3 配置 etcd systemd 服务
ks3 master 三台机器分别创建 /etc/systemd/system/etcd.service
etcd.service中的参数--data-dir指定etcd存储目录,自己部署时根据实际情况配置
3.3.1 k3s1:
cat /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target
[Service]
ExecStart=/usr/local/etcd/etcd \
  --name etcd1 \
  --data-dir=/data/etcd \
  --listen-peer-urls=https://192.168.18.11:2380 \
  --listen-client-urls=https://192.168.18.11:2379,https://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.18.11:2379 \
  --initial-advertise-peer-urls=https://192.168.18.11:2380 \
  --initial-cluster=etcd1=https://192.168.18.11:2380,etcd2=https://192.168.18.12:2380,etcd3=https://192.168.18.13:2380 \
  --initial-cluster-token=etcd-cluster \
  --initial-cluster-state=new \
  --cert-file=/etc/etcd/ssl/server.pem \
  --key-file=/etc/etcd/ssl/server-key.pem \
  --trusted-ca-file=/etc/etcd/ssl/ca.pem \
  --peer-cert-file=/etc/etcd/ssl/server.pem \
  --peer-key-file=/etc/etcd/ssl/server-key.pem \
  --peer-trusted-ca-file=/etc/etcd/ssl/ca.pem
Restart=always
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
3.3.2 k3s2:
service文件复制k3s1的
 修改:
- --name=etcd2
- 除了--initial-cluster的值不变,其余所有 192.168.18.11改为 192.168.18.12
3.3.2 k3s3:
service文件复制k3s1的
 修改:
- --name=etcd3
- 除了--initial-cluster的值不变,其余所有 192.168.18.11改为 192.168.18.13
3.4 启动etcd集群
三台k3s master都执行:
mkdir -p /data/etcd
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
systemctl status etcd
3.5 验证集群
3.5.1查看集群状态
export ETCDCTL_API=3
etcdctl --endpoints=https://192.168.18.11:2379,https://192.168.18.12:2379,https://192.168.18.13:2379 \
  --cacert=/etc/etcd/ssl/ca.pem \
  --cert=/etc/etcd/ssl/server.pem \
  --key=/etc/etcd/ssl/server-key.pem \
  endpoint health
输出
https://192.168.18.11:2379 is healthy: successfully committed proposal: took = 10.53214ms
https://192.168.18.12:2379 is healthy: successfully committed proposal: took = 19.628183ms
https://192.168.18.13:2379 is healthy: successfully committed proposal: took = 20.344884ms
3.5.2查看集群成员
etcdctl --endpoints=https://192.168.18.12:2379 \
  --cacert=/etc/etcd/ssl/ca.pem \
  --cert=/etc/etcd/ssl/server.pem \
  --key=/etc/etcd/ssl/server-key.pem \
  member list
输出
632ec20f66831869, started, etcd1, https://192.168.18.11:2380, https://192.168.18.220:2379, false
8fcad094509e23f8, started, etcd2, https://192.168.18.12:2380, https://192.168.18.213:2379, false
eb8619032bd3933c, started, etcd3, https://192.168.18.13:2380, https://192.168.18.214:2379, false
3.5.3查看集群leader
ETCDCTL_API=3 etcdctl --endpoints=https://192.168.18.11:2379 \
  --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/server.pem --key=/etc/etcd/ssl/server-key.pem \
  endpoint status --write-out=table
输出
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.18.11:2379 | 632ec20f66831869 |  3.5.21 |  8.4 MB |     false |      false |         4 |     583509 |             583509 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
3.5.4数据读写测试
ETCDCTL_API=3 etcdctl --endpoints=https://192.168.18.11:2379 \
  --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/server.pem --key=/etc/etcd/ssl/server-key.pem \
  put foo bar
ETCDCTL_API=3 etcdctl --endpoints=https://192.168.18.11:2379 \
  --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/server.pem --key=/etc/etcd/ssl/server-key.pem \
  get foo
etcd集群安装完成,接下来就可以进行k3s的安装了
四、安装k3s集群
4.1 下载k3s安装包
下载地址:https://github.com/k3s-io/k3s/releases
此处我使用的版本是1.32.6
 
 下载k3s可执行文件和镜像包
 
 k3s可执行文件上传至每台/usr/local/bin,镜像包可暂时放在每台机器的/root下
 在每台机器上下载k3s安装脚本(如果链接失效,就自行百度下,该脚本在官网上是有的)
cd /root
curl -O https://rancher-mirror.rancher.cn/k3s/k3s-install.sh
chmod +x /root/k3s-install.sh
chmod +x /usr/local/bin/k3s
4.2 调整系统参数(每台机器都需要执行)
echo "*   soft    nofile  65535" >> /etc/security/limits.conf
echo "*   hard    nofile  65535" >> /etc/security/limits.conf
cat << eof > /etc/modules-load.d/ipvs.conf 
ip_vs
ip_vs_rr 
ip_vs_wrr 
ip_vs_sh 
eof
cat << eof > /etc/modules-load.d/k8s.conf 
overlay 
br_netfilter 
nf_conntrack 
eof
加载系统参数
cat /etc/modules-load.d/{ipvs,k8s}.conf | xargs -n 1 modprobe
lsmod | grep -e ip_vs -e nf_conntrack_ipv4 -e br_netfilter -e nf_conntrack -e overlay 
cat << eof > /etc/sysctl.conf 
vm.swappiness=0 
net.ipv4.ip_forward = 1 
net.bridge.bridge-nf-call-ip6tables = 1 
net.bridge.bridge-nf-call-iptables = 1 
net.bridge.bridge-nf-call-arptables =1 
eof
sysctl -p
4.3在k3s1上初始化集群
4.3.1 设置参数,执行脚本,初始化集群
参数的含义详见:https://docs.rancher.cn/docs/k3s/installation/install-options/_index
export K3S_DATASTORE_ENDPOINT='https://192.168.18.11:2379,https://192.168.18.12:2379,https://192.168.18.13:2379'
export INSTALL_K3S_SKIP_DOWNLOAD=true
export K3S_DATASTORE_CAFILE='/etc/etcd/ssl/ca.pem' 
export K3S_DATASTORE_CERTFILE='/etc/etcd/ssl/server.pem' 
export K3S_DATASTORE_KEYFILE='/etc/etcd/ssl/server-key.pem'
export INSTALL_K3S_MIRROR=cn
export INSTALL_K3S_EXEC="--flannel-backend=none  --disable-network-policy --cluster-cidr=182.18.0.0/16 --data-dir=/data/rancher/k3s --write-kubeconfig-mode=644 --tls-san 192.168.18.11 --tls-san 192.168.18.12 --tls-san 192.168.18.13 " 
sh k3s-install.sh server
4.3.2 安装calico网络插件
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
4.3.2 查看集群token
cat /data/rancher/k3s/server/token
该token是节点加入集群的唯一凭证,复制token内容
4.4 其余master节点加入集群
在 k3s2、k3s3上执行,注意--token的值,就是/data/rancher/k3s/server/token的内容
export K3S_DATASTORE_ENDPOINT='https://192.168.18.220:2379,https://192.168.18.213:2379,https://192.168.18.214:2379'
export INSTALL_K3S_SKIP_DOWNLOAD=true
export K3S_DATASTORE_CAFILE='/etc/etcd/ssl/ca.pem' 
export K3S_DATASTORE_CERTFILE='/etc/etcd/ssl/server.pem' 
export K3S_DATASTORE_KEYFILE='/etc/etcd/ssl/server-key.pem'
export INSTALL_K3S_MIRROR=cn
export INSTALL_K3S_EXEC="--flannel-backend=none  --disable-network-policy --cluster-cidr=182.18.0.0/16 --data-dir=/data/rancher/k3s --write-kubeconfig-mode=644 --tls-san 192.168.18.11 --tls-san 192.168.18.12 --tls-san 192.168.18.13 --token Kxxxxx56427xxxx2f684b327xxxxxxxx::server:xxxxxx8xxxxxxxxxxxx390d8xxxxx12" 
sh k3s-install.sh server 
此时执行kubectl get node,节点是notReady的,因为镜像包还没有导入
cd /root
ctr -n k8s.io images import k3s-airgap-images-amd64.tar.gz
4.5 node 节点加入集群
4.5.1 配置nginx (所有的node节点上执行)
node节点安装nginx,做高可用
yum install -y nginx nginx-mod-stream
vim /etc/nginx/nginx.conf
 加入stream配置
stream {
    log_format  main  '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
    access_log  /var/log/nginx/stream-access.log  main;
    include /etc/nginx/conf.d/stream/*.conf;
}
mkdir /etc/nginx/conf.d/stream/
cat /etc/nginx/conf.d/stream/k3s.conf
upstream k3s-apiserver {
            server 192.168.18.11:6443 max_fails=3 fail_timeout=10s;
            server 192.168.18.12:6443 max_fails=3 fail_timeout=10s;
            server 192.168.18.13:6443 max_fails=3 fail_timeout=10s;
    }
server {
   listen 6444;
   proxy_pass k3s-apiserver;
}
systemctl enable nginx --now
4.5.2 node节点加入集群
在所有k3s node节点上执行
cd /root/
export INSTALL_K3S_SKIP_DOWNLOAD=true
export INSTALL_K3S_EXEC="agent --disable-apiserver-lb --data-dir=/data/rancher/k3s --token Kxxxxx56427xxxx2f684b327xxxxxxxx::server:xxxxxx8xxxxxxxxxxxx390d8xxxxx12" 
export K3S_URL=https://127.0.0.1:6444
sh k3s-install.sh
导入镜像
cd /root/
ctr -n k8s.io images import k3s-airgap-images-amd64.tar.gz
4.6 查看集群状态
在mater节点上查看集群状态
[root@k3s01 ~]# kubectl get node
NAME              STATUS   ROLES                  AGE   VERSION
k3s01   Ready    control-plane,master   1h   v1.32.6+k3s1
k3s02   Ready    control-plane,master   1h   v1.32.6+k3s1
k3s03   Ready    control-plane,master   1h   v1.32.6+k3s1
k3s04   Ready    <none>                 1h   v1.32.6+k3s1
k3s05   Ready    <none>                 1h   v1.32.6+k3s1
[root@ytsd-test-k3s02 ~]# 
查看pod
[root@k3s01 ~]# kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS      RESTARTS   AGE
kube-system   calico-kube-controllers-75cd4cc5b9-h8xzh   1/1     Running     0          42h
kube-system   calico-node-4fgf2                          1/1     Running     0          40h
kube-system   calico-node-5q5nz                          1/1     Running     0          41h
kube-system   calico-node-7thbf                          1/1     Running     0          42h
kube-system   calico-node-9tgg5                          1/1     Running     0          40h
kube-system   calico-node-ngfhh                          1/1     Running     0          42h
kube-system   coredns-5688667fd4-vgqnm                   1/1     Running     0          42h
kube-system   helm-install-traefik-77lkk                 0/1     Completed   2          42h
kube-system   helm-install-traefik-crd-wtqnh             0/1     Completed   0          42h
kube-system   local-path-provisioner-774c6665dc-2dlx2    1/1     Running     0          42h
kube-system   metrics-server-6f4c6675d5-swttr            1/1     Running     0          42h
kube-system   svclb-traefik-9431d9be-2cjj9               2/2     Running     0          39h
kube-system   svclb-traefik-9431d9be-6wjxd               2/2     Running     0          41h
kube-system   svclb-traefik-9431d9be-84mhk               2/2     Running     0          39h
kube-system   svclb-traefik-9431d9be-jwpsz               2/2     Running     0          42h
kube-system   svclb-traefik-9431d9be-l9z8f               2/2     Running     0          39h
kube-system   traefik-c98fdf6fb-nx8wq                    1/1     Running     0          42h
[root@ytsd-test-k3s02 ~]#
集群安装完成。ctr命令实在用不习惯,大家可以下载使用nerdct,和docker用法一样
五、参考链接
- 基于 Rocky Linux 9.2 部署 K3s v1.30.6+k3s1 高可用集群 本文作者: reche Rocky Linux 中文社区欢迎您 https://www.rockylinux.cn
- https://docs.rancher.cn/
有时候,我会感到迷茫,要学的东西太多,而自己也找不到一个开始的抓手,世界又那么冷,我能向谁请教,谁又愿意分享自己的沉淀呢?那就从我们开始吧,知识应当是开源的!我想称此为知识共产主义,愿与诸君共同进步!