Zero to JupyterHub with Kubernetes 下篇 - Jupyterhub on k8s

发布于:2025-02-10 ⋅ 阅读:(50) ⋅ 点赞:(0)

前言:纯个人记录使用。

官方文档:Zero to JupyterHub with Kubernetes

**版本对应:**This documentation is for Helm chart version 2.0.0 that deploys JupyterHub version 3.0.0 and other components versioned in hub/images/requirements.txt. The Helm chart requires Kubernetes version >=1.20.0 and Helm >=3.5

组件 版本
kubernetes v1.20.4
jupyterhub-chart 2.0.0
helm v3.12.3

第一部分: Setup Kubernetes

1、Setup Kubernetes

kubernetes-v1.20.4 离线二进制部署

[root@k8s-master /data/s0/kubernetes]$ kubectl version --short
Client Version: v1.20.4
Server Version: v1.20.4

2、Setting up helm

通过百度网盘分享的文件:helm-v3.12.3-linux-amd64.tar.gz
链接:https://pan.baidu.com/s/1f8xONKHWshHxieu7jEN4yA
提取码:1234

# 解压安装
[root@k8s-master /data/s0/kubernetes/helm]$ tar -xzvf helm-v3.12.3-linux-amd64.tar.gz
[root@k8s-master /data/s0/kubernetes/helm]$ ln -s /data/s0/kubernetes/helm/linux-amd64/helm /usr/local/bin
# 验证
[root@k8s-master /data/s0/kubernetes/helm]$ helm version
version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}

第二部分: Setup JupyterHub

1、Installing JupyterHub

1.1 下载所需jupyterhub chart版本

JupyterHub’s Helm chart 仓库 --> jupyterhub-2.0.0.tgz

通过百度网盘分享的文件:jupyterhub-2.0.0.tgz
链接:https://pan.baidu.com/s/1ZrEHC9al29ye7n0W3UAi3g
提取码:1234

1.2 下载相关离线镜像
# 解压安装
[root@k8s-master /data/s0/kubernetes/helm]$ tar -xzvf jupyterhub-2.0.0.tgz   # jupyterhub chart 
# 查看所需镜像
[root@k8s-master /data/s0/kubernetes/helm]$ cat jupyterhub/Chart.yaml
annotations:
  artifacthub.io/images: |
    - image: jupyterhub/configurable-http-proxy:4.5.3
      name: configurable-http-proxy
    - image: jupyterhub/k8s-hub:2.0.0
      name: k8s-hub
    - image: jupyterhub/k8s-image-awaiter:2.0.0
      name: k8s-image-awaiter
    - image: jupyterhub/k8s-network-tools:2.0.0
      name: k8s-network-tools
    - image: jupyterhub/k8s-secret-sync:2.0.0
      name: k8s-secret-sync
    - image: jupyterhub/k8s-singleuser-sample:2.0.0
      name: k8s-singleuser-sample
    - image: k8s.gcr.io/kube-scheduler:v1.23.10  # helm upgrate 启动部署时,此版本有问题,改为v1.20.15,注意values.yaml中镜像名称修改,镜像保持一致
      name: kube-scheduler
    - image: k8s.gcr.io/pause:3.8  # 部署k8s时已下载安装,注意values.yaml中镜像名称修改,保持一致
      name: pause
    - image: k8s.gcr.io/pause:3.8
      name: pausd
    - image: traefik:v2.8.4
      name: traefik
      
# 联网保存本地镜像
# 1. 下载保存 jupyterhub/configurable-http-proxy:4.5.3
> docker pull quay.io/jupyterhub/configurable-http-proxy:4.5.3
> docker tag quay.io/jupyterhub/configurable-http-proxy:4.5.3 jupyterhub/configurable-http-proxy:4.5.3
> docker save -o configurable-http-proxy:4.5.3.tar jupyterhub/configurable-http-proxy:4.5.3  

# 2. 下载保存 jupyterhub/k8s-hub:2.0.0
> docker pull quay.io/jupyterhub/k8s-hub:2.0.0
> docker tag quay.io/jupyterhub/k8s-hub:2.0.0 jupyterhub/k8s-hub:2.0.0
> docker save -o k8s-hub:2.0.0.tar jupyterhub/k8s-hub:2.0.0

# 3. 下载保存 jupyterhub/k8s-image-awaiter:2.0.0
> docker pull quay.io/jupyterhub/k8s-image-awaiter:2.0.0
> docker tag  quay.io/jupyterhub/k8s-image-awaiter:2.0.0 jupyterhub/k8s-image-awaiter:2.0.0

> docker save -o k8s-image-awaiter:2.0.0.tar jupyterhub/k8s-image-awaiter:2.0.0
# 4. 下载保存 jupyterhub/k8s-network-tools:2.0.0
> docker pull quay.io/jupyterhub/k8s-network-tools:2.0.0
> docker tag  quay.io/jupyterhub/k8s-network-tools:2.0.0 jupyterhub/k8s-network-tools:2.0.0
> docker save -o k8s-network-tools:2.0.0.tar jupyterhub/k8s-network-tools:2.0.0

# 5. 下载保存 jupyterhub/k8s-secret-sync:2.0.0
> docker pull quay.io/jupyterhub/k8s-secret-sync:2.0.0
> docker tag quay.io/jupyterhub/k8s-secret-sync:2.0.0 jupyterhub/k8s-secret-sync:2.0.0
> docker save -o k8s-secret-sync:2.0.0.tar jupyterhub/k8s-secret-sync:2.0.0

# 6. 下载保存 jupyterhub/k8s-singleuser-sample:2.0.0
> docker pull m.daocloud.io/docker.io/jupyterhub/k8s-singleuser-sample:2.0.0
> docker tag m.daocloud.io/docker.io/jupyterhub/k8s-singleuser-sample:2.0.0 jupyterhub/k8s-singleuser-sample:2.0.0
> docker save -o k8s-singleuser-sample:2.0.0.tar jupyterhub/k8s-singleuser-sample:2.0.0

# 7. 下载保存 k8s.gcr.io/kube-scheduler:v1.20.15
> docker pull k8s-gcr.m.daocloud.io/kube-scheduler:v1.20.15
> docker tag k8s-gcr.m.daocloud.io/kube-scheduler:v1.20.15  k8s.gcr.io/kube-scheduler:v1.20.15
> docker save -o kube-scheduler:v1.20.15.tar k8s.gcr.io/kube-scheduler:v1.20.15

# 8. 下载保存 traefik:v2.8.4
> docker pull m.daocloud.io/docker.io/library/traefik:v2.8.4
> docker tag m.daocloud.io/docker.io/library/traefik:v2.8.4 traefik:v2.8.4
> docker save -o traefik:v2.8.4.tar traefik:v2.8.4

## 9. 将离线镜像打包上传
> tar -czvf jupyterhub-chart-images.tgz ./*
> scp jupyterhub-chart-images.tgz k8s-master:/data/s0/kubernetes/helm
1.3 加载镜像
# ------------------ k8s-matser,k8s-node1、k8s-node2 ----------------------------
# 1. 加载镜像,node1、node2节点同理
[root@k8s-master /data/s0/kubernetes/helm]$ tar -xzvf jupyterhub-chart-images.tgz -C ./chart-images
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i configurable-http-proxy:4.5.3.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i k8s-hub:2.0.0.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i k8s-image-awaiter:2.0.0.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i k8s-network-tools:2.0.0.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i k8s-secret-sync:2.0.0.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i k8s-singleuser-sample:2.0.0.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i kube-scheduler:v1.20.15.tar
[root@k8s-master /data/s0/kubernetes/helm/chart-images]$ docker load -i traefik:v2.8.4.tar

# 2.加载自定义用户科学环境;默认的单用户服务器jupyter镜像 k8s-singleuser-sample
# docker pull m.daocloud.io/docker.io/jupyter/datascience-notebook 默认拉取最新版本,最好指定版本,否则每次拉最新的
[root@k8s-master /data/s0/kubernetes/helm]$ docker load -i datascience-notebook.tar
# 注意:k8s在不指定镜像拉取策略imagePullPolicy的情况下,如果镜像标签tag:latest,imagePullPolicy默认值为“Always” 总是从镜像库拉取;
# 如果镜像标签tag不是:latest,imagePullPolicy默认值为“IfNotPresent” 本地有使用本地镜像,本地没有则拉取镜像库;
[root@k8s-master /data/s0/kubernetes/helm]$ docker tag jupyter/datascience-notebook:latest jupyter/datascience-notebook:2023.10.23
1.4 jupyterhub 配置
# jupyterhub 自定义配置
[root@datanode40 /data/s0/kubernetes/helm]$ touch config.yaml
[root@datanode40 /data/s0/kubernetes/helm]$ vim config.yaml

config.yaml 内容如下:

# 应用名称(deployment、service、pod等资源对象名称)
fullnameOverride: "jupyterhub"

# 拉取镜像时,相关仓库身份认证(使用本机离线镜像)
imagePullSecret:
  create: false
  automaticReferenceInjection: false

# hub服务pod配置(auth权限认证)
hub:
  revisionHistoryLimit: 1                  # Kubernetes 中保留的历史版本数量
  config:                                  # jupyterhub_cnfig.py 配置文件内容
    JupyterHub:
      admin_access: true
      admin_users: 
         - zyp                             # 设置管理员用户
      authenticator_class: dummy           # 用户验证,测试采用虚拟验证
  service:
    type: ClusterIP                        
    ports:
      nodePort:
  db:
    type: sqlite-pvc                       #  JupyterHub 使用数据库,存储用户信息、服务器状态、活动记录等数据
    pvc:                                   # 需要预先创建对应pv
      accessModes:
        - ReadWriteOnce
      storage: 2Gi
      subPath: sqlite                      # PV存储卷子路径,默认根路径
      storageClassName: sqlite-pv          # 存储类别
  image:
    name: jupyterhub/k8s-hub
    tag: "2.0.0"
    pullPolicy: IfNotPresent

#设置 chp(configurable-http-proxy)pod的代理、公网代理、https代理相关
proxy:
  service:
    type: NodePort                            # 公网代理服务                
    nodePorts:
      http: 30081
  chp:                                         # configurable-http-proxy (chp)配置
    revisionHistoryLimit: 1
    image:
      name: jupyterhub/configurable-http-proxy
      tag: "4.5.3" 
      pullPolicy: IfNotPresent
  https:
    enabled: false                              # 禁用https          


# 单用户jupyter服务器
singleuser:
  networkTools:
    image:
      name: jupyterhub/k8s-network-tools
      tag: "2.0.0"
      pullPolicy: IfNotPresent
  networkPolicy:
    enabled: true
    egressAllowRules:
      privateIPs: true                           # 允许容器访问公网 (如访问外网服务)     
  storage:                                       # 配置单用户环境存储
    type: static                                 # 静态挂载方式
    static:
      pvcName: notebook-pvc                      # 存储pvc名称,需手动创建pvc和pv
      subPath: "{username}"
    capacity: 10Gi
    homeMountPath: /home/jovyan                  # 容器中挂载主文件夹存储的位置     
  # Defines the default image  
  image:
    name: jupyterhub/k8s-singleuser-sample        # 可修改为自己的科学计算环境
    tag: "2.0.0"
    pullPolicy: IfNotPresent 
  profileList:                                    # 用户科学环境选择
    - display_name: "sample environment"
      description: "To avoid too much bells and whistles: Python."
      default: true
    - display_name: "Datascience environment"
      description: "If you want the additional bells and whistles: Python, R, and Julia."
      kubespawner_override:
        image: jupyter/datascience-notebook:v1
        pullPolicy: IfNotPresent
  startTimeout: 300
  cpu:
    limit:
    guarantee: 0.5
  memory:
    limit:
    guarantee: 1G
  cmd: jupyterhub-singleuser       # 容器内,启动单用户服务器的命令
  defaultUrl: "/lab"               # 用户jupyter界面
  extraEnv:
    JUPYTERHUB_SINGLEUSER_APP: "jupyter_server.serverapp.ServerApp"  

# k8s 容器调度相关
scheduling:
  userScheduler:
    revisionHistoryLimit: 1
    image:
      name: k8s.gcr.io/kube-scheduler
      tag: "v1.20.15" 
      pullPolicy: IfNotPresent
  userPlaceholder:
    image:
      name: k8s.gcr.io/pause
      tag: "3.8"
      pullPolicy: IfNotPresent


# 镜像拉取器
prePuller:                               
  hook:
    enabled: false                       # 离线环境,本地镜像,无需拉取
    pullOnlyOnChanges: false
  continuous:
    enabled: false
  pullProfileListImages: false


# 空闲进程杀死服务 
cull:
  enabled: true
  users: false # --cull-users
  adminUsers: true # --cull-admin-users
  removeNamedServers: false # --remove-named-servers
  timeout: 3600 # --timeout
  every: 600 # --cull-every
  concurrency: 10 # --concurrency
  maxAge: 0 # --max-age
1.4.1 预先配置pv与pvc

pv 持久化参见 Kubernetes 常规使用记录

# 配置sqlite存储的PV 和 单用户服务器存储的pv和PVC
[root@k8s-node1 /data/s0/kubernetes/nfs]$ mkdir pvs
[root@k8s-node1 /data/s0/kubernetes/nfs/pvs]$ vim pvs.yaml
# sqlite存储的PV
apiVersion: v1
kind: PersistentVolume
metadata:
  name: sqlite-pv1      
spec:
  nfs: 
    path: /data/s0/kubernetes/nfs/pv1
    readOnly: false 
    server: k8s-node1
  capacity: 
    storage: 2Gi
  accessModes: 
    - ReadWriteOnce
  storageClassName: sqlite-pv  
  persistentVolumeReclaimPolicy: Retain
---
# 单用户服务器pv
apiVersion: v1
kind: PersistentVolume
metadata:
  name: notebook-pv2      
spec:
  nfs: 
    path: /data/s0/kubernetes/nfs/pv2
    readOnly: false 
    server: k8s-node1
  capacity: 
    storage: 200Gi
  accessModes: 
    - ReadWriteMany 
  storageClassName: single-notebook   
  persistentVolumeReclaimPolicy: Retain 
---  
# 单用户服务器pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: notebook-pvc                # 与配置文件对应 
  namespace: jhub 
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: single-notebook  
  resources: 
    requests:
      storage: 20Gi
1.5 启动jupyterhub
# 创建命名空间
[root@k8s-master /data/s0/kubernetes/helm]$ kubectl create ns jhub
# 启动预设pvc和pv 
[root@k8s-node1 /data/s0/kubernetes/nfs/pvs]$ kubectl apply -f pvs.yaml
# 启动jupyterhub
[root@k8s-master /data/s0/kubernetes/helm]$ helm upgrade --cleanup-on-fail \
 --install jupyterhub-release ./jupyterhub \
 --namespace jhub \
 --values config.yaml

在这里插入图片描述

# 验证pod运行状态(若存在pod 状态  Pending or ContainerCreating --》 kubectl --namespace=jhub describe pod <name of pod>)
[root@k8s-master /data/s0/kubernetes/helm]$ kubectl --namespace=jhub get pod
jupyterhub-hub-c87985f75-lkl4f               1/1     Running   0          5m18s
jupyterhub-proxy-5d95bb6786-87cqs            1/1     Running   0          5m18s
jupyterhub-user-scheduler-786c6759c7-2r24k   1/1     Running   0          5m18s
jupyterhub-user-scheduler-786c6759c7-6x5k6   1/1     Running   0          5m18s


# 验证是否为k8s服务jupyterhub-proxy-public提供了外部IP
[root@k8s-master /data/s0/kubernetes/helm]$ kubectl --namespace=jhub get svc 
NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
jupyterhub-hub            ClusterIP   10.0.0.50    <none>        8081/TCP       90s
jupyterhub-proxy-api      ClusterIP   10.0.0.196   <none>        8001/TCP       90s
jupyterhub-proxy-public   NodePort    10.0.0.51    <none>        80:30081/TCP   90s

问题:Error: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole “jupyterhub-user-scheduler” in namespace “” exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key “meta.helm.sh/release-name” must equal “jupyterhub-release”: current value is “jupyterhub-v1”

解决:

​ kubectl delete clusterrole jupyterhub-user-scheduler

​ kubectl delete clusterrolebinding jupyterhub-user-scheduler

1.6 jupyterhub 服务验证

远程主机登录 http://k8s-matser:80081

  • 用户登录界面
    在这里插入图片描述

  • 科学计算环境选择界面
    在这里插入图片描述

  • 用户分析操作界面

    在这里插入图片描述

  • 底层单用户容器

在这里插入图片描述

  • 持久化存储查看

在这里插入图片描述

  • k8s管理界面查看

在这里插入图片描述

第三部分: 常见问题

1、Python环境,安装包

## 容器安装其他pyhton库
# 1. 编辑容器(下载相关包)
> docker run -it --user root -e GRANT_SUDO=yes jupyter/datascience-notebook:v1 /bin/bash 
(base) jovyan@31e4bc7354a7:~$ pip install xxxx     # 安装包
(base) jovyan@31e4bc7354a7:~$ exit   # 退出编辑
# 2. 将修改后容器保存成镜像文件
> docker commit <容器ID或容器名> jupyter/datascience-notebook:v2
# 3. 将修改后镜像改回原先tag
> docker rmi jupyter/datascience-notebook:v1   # 删除原先单用户启动镜像
> docker commit jupyter/datascience-notebook:v2 jupyter/datascience-notebook:v1

2、pod容器内,访问外网域名解析问题

# 编辑k8s,coredns服务配置
[root@k8s-master ~]$ kubectl edit configmap coredns -n kube-system
# 添加以下内容

在这里插入图片描述


网站公告

今日签到

点亮在社区的每一天
去签到