
Great Expectations
Greate Expectation Website: https://greatexpectations.io/.
Greate Expectation Document: https://docs.greatexpectations.io/docs/.
Greate Expectation Github: https://github.com/great-expectations/great_expectations.
1.Introduction
1.1 Introduction
- 一共三点:验证,记录和分析。
1.2 SLACK
- 可以进slack社区提问。
1.3 Integrations
1.4 What does Great Expectations not do
2.Install
- install great_expectation
sudo pip3 install great_expectations- 看看安装路径
sudo python3 -m site
👉👉/usr/local/python3/bin- 查询版号
/usr/local/python3/bin/great_expectations --version- 初始化init
/usr/local/python3/bin/great_expectations init
- 创建软连
sudo ln -s /usr/local/python3/bin/great_expectations /usr/bin/great_expectations
3.Connect to Data
3.1 Connect File
- 初始化init
great_expectations datasource new --no-jupyter
enter option 1好像只支持csv
enter option 1
测试数据
将测试数据.csv放入data目录
在data同级目录great_expectations datasource new --no-jupyter
- 按照提示继续
jupyter notebook /home/os-nan.zhao/great_expectations/uncommitted/datasource_new.ipynb --allow-root --ip 0.0.0.0- 打开浏览器访问
datasource_new.ipynb就是生成的配置文件
3.2 Connect DB
- 连接DB
great_expectations datasource new --no-jupyter
enter option 2
enter option 1 =>我用的mysql
- 我是用python3,所以要手动执行
sudo pip3 install psycopg2-binary
- 重新执行上一步
great_expectations datasource new --no-jupyter
- 按照提示继续
jupyter notebook /home/os-nan.zhao/great_expectations/uncommitted/datasource_new.ipynb --allow-root --ip 0.0.0.0
- 浏览器访问红框中的地址
- 将token输入,enter new password
datahub@123
- 点进datasource_new.ipynd
sudo pip3 install pymysql
sudo pip3 install pymssql- 给datasource命名,好区分
- 配置DB的信息
第三步的配置,将DB信息都写入变数
host = “YOUR_HOST”
port = “3306”/“1433”/“1521”
username = “YOUR_USERNAME”
password = “YOUR_PASSWORD”
database = “YOUR_DATABASE”
schema_name = “YOUR_SCHEMA”
第四步的配置,导入变数
example_yaml = f"“”
name: {datasource_name}
class_name: Datasource
execution_engine:
class_name: SqlAlchemyExecutionEngine
credentials:
host: {host}
port: ‘{port}’
username: {username}
password: {password}
database: {database}
schema_name: {schema_name}
drivername: mysql+pymysql/mssql+pymssql/oracle+cx_oracle =>对应前面的port
data_connectors:
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
default_inferred_data_connector_name:
class_name: InferredAssetSqlDataConnector
include_schema_name: True"“”
print(example_yaml)
#----第三步----
host = "YOUR_HOST"
port = "3306"/"1433"/"1521"
username = "YOUR_USERNAME"
password = "YOUR_PASSWORD"
database = "YOUR_DATABASE"
schema_name = "YOUR_SCHEMA"
#----第四步----
example_yaml = f"""
name: {datasource_name}
class_name: Datasource
execution_engine:
class_name: SqlAlchemyExecutionEngine
credentials:
host: {host}
port: '{port}'
username: {username}
password: {password}
database: {database}
schema_name: {schema_name}
drivername: mysql+pymysql/mssql+pymssql/oracle+cx_oracle =>对应前面的port
data_connectors:
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
default_inferred_data_connector_name:
class_name: InferredAssetSqlDataConnector
include_schema_name: True"""
print(example_yaml)
3.3 Database and File
Expectation Website: https://greatexpectations.io/expectations/.
- Notice:no oracle
4.Create Expectation
4.1 create expectation
- 另开窗口,继续执行
great_expectations suite new- select
enter option 3
enter option 1
- Enter the file name
enter Expectation name
- open Jupyter Notebook
直接访问提示的地址,进入Jupyter Notebook,或者前面DB的那个地址也可以
- 晕死,没有开8889的port
这个datahub,真难提前开好所有port
- 三种模式
三种模式对应三种不同的配置文件
第一种只是拿到了DB的数据,没有validator
第二种不仅拿到了DB的数据,初始化validator
第三种不仅拿到了DB的数据,初始化validator,也配置了profiler
- Profiler config
- using semantics_type_dict
- cli端开启expectation
jupyter notebook /great_expectations/uncommitted/xxxxx.ipynb --allow-root --ip 0.0.0.0
4.2 create custom expectation
- 参考官网
Custom Expectations: https://docs.greatexpectations.io/docs/guides/expectations/creating_custom_expectations/overview.
- 官方提供了几个demo,Git上面有参考模板,复制到指定位置,修改名字
- 用python启动执行
- 修改template里面的类名
- 官方提供了测试数据,进行测试,最后再用python执行一次
- edit metric name and add metric
- 对template 进行lint,分别 black, isort, flake8, and pyupgrade执行template.py
- 再次用python执行,确保全都打勾勾
- 调用
- 大家开源的
Opensource Expectation: https://greatexpectations.io/expectations/.
5.Create Checkpoint
5.1 create checkpoint
- create
great_expectations checkpoint new my_checkpoint
- 可以进行编辑,然后进行校验
都在UI界面操作
- 保存编辑后的checkpoint
- 执行
- 主要就是可以配置多个expectation
5.2 Email Notification
- 每一个expectation都可以配置发送邮件,还可以使用action触发Slack/Opsgenie发送通知
5.3 schedule
Cron Schedule: https://docs.greatexpectations.io/docs/guides/validation/advanced/how_to_deploy_a_scheduled_checkpoint_with_cron.
5.4 result
6.Waken
在一秒钟内看到本质的人和花半辈子也看不清一件事本质的人,自然是不一样的命运。