【Docker】Airflow 容器化部署

发布于:2024-03-26 ⋅ 阅读:(135) ⋅ 点赞:(0)

Airflow环境标准软件基于Bitnami Airflow 构建。当前版本为2.8.2

你可以通过轻云UC部署工具直接安装部署,也可以手动按如下文档操作,该项目已经全面开源,可以从如下环境获取
配置文件地址: https://gitee.com/qingplus/qingcloud-platform

qinghub自动安装部署配置库

什么是Airflow?

Apache Airflow 是一种以有向无环图 (DAG) 形式表达和执行工作流程的工具。它包括用于计划任务、监视任务进度和处理任务依赖性的实用程序。以编程方式编写,安排和监视工作流的平台。 Airflow scheduler程序在遵循指定的依赖项,同时在一组工作线程上执行任务。丰富的命令实用程序使在DAG上执行复杂的调度变的轻而易举。丰富的用户界面使查看生产中正在运行的管道,监视进度以及需要时对问题进行故障排除变的容易。

快速运行

docker run --name airflow bitnami/airflow:latest

注意: 此快速启动仅适用于开发环境。建议更改不安全的默认凭据,并查看“环境变量”部分中的可用配置选项,以实现更安全的部署。

先决条件

要运行此应用程序,您需要Docker Engine >= 1.10.0。建议使用Docker Compose1.6.0版本或更高版本。

怎样使用该image

Airflow 需要访问 PostgreSQL 数据库来存储信息。此外,如果需要使用CeleryExecutor执行器,还需要一个Airflow Scheduler、一个或多个Airflow Workers和一台Redis® 服务器。

使用 Docker 命令行

  1. 创建网络

    docker network create airflow-tier
    
  2. 创建用于 PostgreSQL 持久化的卷并创建 PostgreSQL 容器

    docker volume create --name postgresql_data
    docker run -d --name postgresql \
      -e POSTGRESQL_USERNAME=bn_airflow \
      -e POSTGRESQL_PASSWORD=bitnami1 \
      -e POSTGRESQL_DATABASE=bitnami_airflow \
      --net airflow-tier \
      --volume postgresql_data:/bitnami/postgresql \
      bitnami/postgresql:latest
    
  3. 创建 Redis® 持久性卷并创建 Redis® 容器

    docker volume create --name redis_data
    docker run -d --name redis \
      -e ALLOW_EMPTY_PASSWORD=yes \
      --net airflow-tier \
      --volume redis_data:/bitnami \
      bitnami/redis:latest
    
  4. 启动 Apache Airflow Web 容器

    docker run -d --name airflow -p 8080:8080 \
      -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
      -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
      -e AIRFLOW_EXECUTOR=CeleryExecutor \
      -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
      -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
      -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
      -e AIRFLOW_LOAD_EXAMPLES=yes \
      -e AIRFLOW_PASSWORD=bitnami123 \
      -e AIRFLOW_USERNAME=user \
      -e AIRFLOW_EMAIL=user@example.com \
      --net airflow-tier \
      bitnami/airflow:latest
    
  5. 启动 Apache Airflow 调度程序容器

    docker run -d --name airflow-scheduler \
      -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
      -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
      -e AIRFLOW_EXECUTOR=CeleryExecutor \
      -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
      -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
      -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
      -e AIRFLOW_LOAD_EXAMPLES=yes \
      -e AIRFLOW_WEBSERVER_HOST=airflow \
      --net airflow-tier \
      bitnami/airflow-scheduler:latest
    
  6. 启动 Apache Airflow 工作容器

    docker run -d --name airflow-worker \
      -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
      -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
      -e AIRFLOW_EXECUTOR=CeleryExecutor \
      -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
      -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
      -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
      -e AIRFLOW_WEBSERVER_HOST=airflow \
      --net airflow-tier \
      bitnami/airflow-worker:latest
    

访问地址: http://your-ip:8080

持久化应用

Airflow 容器依赖 PostgreSQL 数据库和 Redis 来保存数据。这意味着 Airflow 不会保留任何东西。为了避免数据丢失,您应该安装卷以持久保存PostgreSQL 数据和Redis® 数据
上面的示例定义了 docker 卷postgresql_data,即 、 和redis_data。只要不删除这些卷,Airflow 应用程序状态就会持续存在。
为了避免无意中删除这些卷,您可以将主机目录安装为数据卷。或者,您可以使用卷插件来托管卷数据。

使用 Docker Compose 将主机目录挂载为数据卷

以下docker-compose.yml模板演示了如何使用主机目录作为数据卷。

version: '2'
services:
  postgresql:
    image: 'bitnami/postgresql:latest'
    environment:
      - POSTGRESQL_DATABASE=bitnami_airflow
      - POSTGRESQL_USERNAME=bn_airflow
      - POSTGRESQL_PASSWORD=bitnami1
    volumes:
      - /path/to/postgresql-persistence:/bitnami/postgresql
  redis:
    image: 'bitnami/redis:latest'
    environment:
      - ALLOW_EMPTY_PASSWORD=yes
    volumes:
      - /path/to/redis-persistence:/bitnami
  airflow-worker:
    image: bitnami/airflow-worker:latest
    environment:
      - AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
      - AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08=
      - AIRFLOW_EXECUTOR=CeleryExecutor
      - AIRFLOW_DATABASE_NAME=bitnami_airflow
      - AIRFLOW_DATABASE_USERNAME=bn_airflow
      - AIRFLOW_DATABASE_PASSWORD=bitnami1
      - AIRFLOW_LOAD_EXAMPLES=yes
  airflow-scheduler:
    image: bitnami/airflow-scheduler:latest
    environment:
      - AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
      - AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08=
      - AIRFLOW_EXECUTOR=CeleryExecutor
      - AIRFLOW_DATABASE_NAME=bitnami_airflow
      - AIRFLOW_DATABASE_USERNAME=bn_airflow
      - AIRFLOW_DATABASE_PASSWORD=bitnami1
      - AIRFLOW_LOAD_EXAMPLES=yes
  airflow:
    image: bitnami/airflow:latest
    environment:
      - AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
      - AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08=
      - AIRFLOW_EXECUTOR=CeleryExecutor
      - AIRFLOW_DATABASE_NAME=bitnami_airflow
      - AIRFLOW_DATABASE_USERNAME=bn_airflow
      - AIRFLOW_DATABASE_PASSWORD=bitnami1
      - AIRFLOW_PASSWORD=bitnami123
      - AIRFLOW_USERNAME=user
      - AIRFLOW_EMAIL=user@example.com
    ports:
      - '8080:8080'
使用 Docker 命令行将主机目录挂载为数据卷
  1. 创建网络(如果不存在)

    docker network create airflow-tier
    
  2. 使用主机卷创建 PostgreSQL 容器

    docker run -d --name postgresql \
      -e POSTGRESQL_USERNAME=bn_airflow \
      -e POSTGRESQL_PASSWORD=bitnami1 \
      -e POSTGRESQL_DATABASE=bitnami_airflow \
      --net airflow-tier \
      --volume /path/to/postgresql-persistence:/bitnami \
      bitnami/postgresql:latest
    
  3. 使用主机卷创建 Redis® 容器

    docker run -d --name redis \
      -e ALLOW_EMPTY_PASSWORD=yes \
      --net airflow-tier \
      --volume /path/to/redis-persistence:/bitnami \
      bitnami/redis:latest
    
  4. 创建Airflow容器

    docker run -d --name airflow -p 8080:8080 \
      -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
      -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
      -e AIRFLOW_EXECUTOR=CeleryExecutor \
      -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
      -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
      -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
      -e AIRFLOW_LOAD_EXAMPLES=yes \
      -e AIRFLOW_PASSWORD=bitnami123 \
      -e AIRFLOW_USERNAME=user \
      -e AIRFLOW_EMAIL=user@example.com \
      --net airflow-tier \
      bitnami/airflow:latest
    
  5. 创建 Airflow Scheduler 容器

    docker run -d --name airflow-scheduler \
      -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
      -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
      -e AIRFLOW_EXECUTOR=CeleryExecutor \
      -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
      -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
      -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
      -e AIRFLOW_LOAD_EXAMPLES=yes \
      -e AIRFLOW_WEBSERVER_HOST=airflow \
      --net airflow-tier \
      bitnami/airflow-scheduler:latest
    
  6. 创建 Airflow Worker 容器

    docker run -d --name airflow-worker \
      -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
      -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
      -e AIRFLOW_EXECUTOR=CeleryExecutor \
      -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
      -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
      -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
      -e AIRFLOW_WEBSERVER_HOST=airflow \
      --net airflow-tier \
      bitnami/airflow-worker:latest
    

配置

加载 DAG 文件

自定义 DAG 文件可以安装到/opt/bitnami/airflow/dags.

安装额外的 python 模块

该容器支持在启动时安装额外的 python 模块。为此,您可以requirements.txt根据您的特定需求在路径下挂载一个文件/bitnami/python/requirements.txt。

环境变量

可定制的环境变量
Name Description Default Value
AIRFLOW_USERNAME Airflow username user
AIRFLOW_PASSWORD Airflow password bitnami
AIRFLOW_FIRSTNAME Airflow firstname Firstname
AIRFLOW_LASTNAME Airflow lastname Lastname
AIRFLOW_EMAIL Airflow email user@example.com
AIRFLOW_EXECUTOR Airflow executor. SequentialExecutor
AIRFLOW_EXECUTOR Airflow executor. CeleryExecutor
AIRFLOW_FORCE_OVERWRITE_CONF_FILE Force the airflow.cfg config file generation. no
AIRFLOW_WEBSERVER_HOST Airflow webserver host 127.0.0.1
AIRFLOW_WEBSERVER_PORT_NUMBER Airflow webserver port. 8080
AIRFLOW_LOAD_EXAMPLES To load example tasks into the application. yes
AIRFLOW_HOSTNAME_CALLABLE Method to obtain the hostname. socket.gethostname
AIRFLOW_DATABASE_HOST Hostname for PostgreSQL server. postgresql
AIRFLOW_DATABASE_HOST Hostname for PostgreSQL server. 127.0.0.1
AIRFLOW_DATABASE_PORT_NUMBER Port used by PostgreSQL server. 5432
AIRFLOW_DATABASE_NAME Database name that Airflow will use to connect with the database. bitnami_airflow
AIRFLOW_DATABASE_USERNAME Database user that Airflow will use to connect with the database. bn_airflow
AIRFLOW_DATABASE_USE_SSL Set to yes if the database is using SSL. no
AIRFLOW_REDIS_USE_SSL Set to yes if Redis® uses SSL. no
REDIS_HOST Hostname for Redis® server. redis
REDIS_HOST Hostname for Redis® server. 127.0.0.1
REDIS_PORT_NUMBER Port used by Redis® server. 6379
REDIS_DATABASE Name of the Redis® database. 1
AIRFLOW_LDAP_ENABLE Enable LDAP authentication. no
AIRFLOW_LDAP_USER_REGISTRATION User self registration. True
AIRFLOW_LDAP_ROLES_SYNC_AT_LOGIN Replace ALL the user roles each login, or only on registration. True
AIRFLOW_LDAP_USE_TLS Use LDAP SSL. False
AIRFLOW_LDAP_ALLOW_SELF_SIGNED Allow self signed certicates in LDAP ssl. True
只读环境变量
Name Description Value
AIRFLOW_BASE_DIR Airflow installation directory. ${BITNAMI_ROOT_DIR}/airflow
AIRFLOW_HOME Airflow home directory. ${AIRFLOW_BASE_DIR}
AIRFLOW_BIN_DIR Airflow directory for binary executables. ${AIRFLOW_BASE_DIR}/venv/bin
AIRFLOW_LOGS_DIR Airflow logs directory. ${AIRFLOW_BASE_DIR}/logs
AIRFLOW_SCHEDULER_LOGS_DIR Airflow scheduler logs directory. ${AIRFLOW_LOGS_DIR}/scheduler
AIRFLOW_LOG_FILE Airflow log file. ${AIRFLOW_LOGS_DIR}/airflow-webserver.log
AIRFLOW_CONF_FILE Airflow configuration file. ${AIRFLOW_BASE_DIR}/airflow.cfg
AIRFLOW_WEBSERVER_CONF_FILE Airflow configuration file. ${AIRFLOW_BASE_DIR}/webserver_config.py
AIRFLOW_TMP_DIR Airflow directory temporary files. ${AIRFLOW_BASE_DIR}/tmp
AIRFLOW_PID_FILE Path to the Airflow PID file. ${AIRFLOW_TMP_DIR}/airflow-webserver.pid
AIRFLOW_DAGS_DIR Airflow data to be persisted. ${AIRFLOW_BASE_DIR}/dags
AIRFLOW_DAEMON_USER Airflow system user. airflow
AIRFLOW_DAEMON_GROUP Airflow system group. airflow

除了前面的环境变量之外,配置文件中的所有参数都可以使用以下格式的环境变量覆盖:AIRFLOW__{SECTION}__{KEY}. 注意双下划线。

使用 Docker Compose 指定环境变量
version: '2'

services:
  airflow:
    image: bitnami/airflow:latest
    environment:
      - AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
      - AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08=
      - AIRFLOW_EXECUTOR=CeleryExecutor
      - AIRFLOW_DATABASE_NAME=bitnami_airflow
      - AIRFLOW_DATABASE_USERNAME=bn_airflow
      - AIRFLOW_DATABASE_PASSWORD=bitnami1
      - AIRFLOW_PASSWORD=bitnami123
      - AIRFLOW_USERNAME=user
      - AIRFLOW_EMAIL=user@example.com
在 Docker 命令行上指定环境变量
docker run -d --name airflow -p 8080:8080 \
    -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
    -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
    -e AIRFLOW_EXECUTOR=CeleryExecutor \
    -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
    -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
    -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
    -e AIRFLOW_PASSWORD=bitnami123 \
    -e AIRFLOW_USERNAME=user \
    -e AIRFLOW_EMAIL=user@example.com \
    bitnami/airflow:latest
SMTP 配置

要将 Airflow 配置为使用 SMTP 发送电子邮件,您可以设置以下环境变量:

  • AIRFLOW__SMTP__SMTP_HOST: Default: localhost
  • AIRFLOW__SMTP__SMTP_PORT: 发 SMTP 电子邮件的主机。默认值:: 25
  • AIRFLOW__SMTP__SMTP_STARTTLS: 发送 SMTP 电子邮件的端口。默认值: True
  • AIRFLOW__SMTP__SMTP_SSL: 使用 SSL 通信。默认值: False
  • AIRFLOW__SMTP__SMTP_USER: 用于身份验证的 SMTP 用户(可能是电子邮件). 没有默认值.
  • AIRFLOW__SMTP__SMTP_PASSWORD: SMTP 的密码。没有默认值。
  • AIRFLOW__SMTP__SMTP_MAIL_FROM: 修改“发件人电子邮件地址”。默认值: airflow@example.com

这是SMTP 配置示例:

  • docker-compose(应用程序部分):
  airflow:
    image: bitnami/airflow:latest
    environment:
      - AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
      - AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08=
      - AIRFLOW_EXECUTOR=CeleryExecutor
      - AIRFLOW_DATABASE_NAME=bitnami_airflow
      - AIRFLOW_DATABASE_USERNAME=bn_airflow
      - AIRFLOW_DATABASE_PASSWORD=bitnami1
      - AIRFLOW_PASSWORD=bitnami
      - AIRFLOW_USERNAME=user
      - AIRFLOW_EMAIL=user@email.com
      - AIRFLOW__SMTP__SMTP_HOST=smtp@gmail.com
      - AIRFLOW__SMTP__SMTP_USER=your_email@gmail.com
      - AIRFLOW__SMTP__SMTP_PASSWORD=your_password
      - AIRFLOW__SMTP__SMTP_PORT=587
    ports:
      - '8080:8080'
  • 手动执行:
docker run -d --name airflow -p 8080:8080 \
    -e AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= \
    -e AIRFLOW_SECRET_KEY=a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08= \
    -e AIRFLOW_EXECUTOR=CeleryExecutor \
    -e AIRFLOW_DATABASE_NAME=bitnami_airflow \
    -e AIRFLOW_DATABASE_USERNAME=bn_airflow \
    -e AIRFLOW_DATABASE_PASSWORD=bitnami1 \
    -e AIRFLOW_PASSWORD=bitnami123 \
    -e AIRFLOW_USERNAME=user \
    -e AIRFLOW_EMAIL=user@example.com \
    -e AIRFLOW__SMTP__SMTP_HOST=smtp@gmail.com \
    -e AIRFLOW__SMTP__SMTP_USER=your_email@gmail.com \
    -e AIRFLOW__SMTP__SMTP_PASSWORD=your_password \
    -e AIRFLOW__SMTP__SMTP_PORT=587 \
    bitnami/airflow:latest
本文含有隐藏内容,请 开通VIP 后查看

网站公告

今日签到

点亮在社区的每一天
去签到