采用异构线性回归的FATE隐私计算
在csdn上基本没有关于纵向实践的保姆级教程
在经历数次错误的血与泪中终于成功
给大家分享一下,希望能帮到大家
这里特别感谢嘉然然博主,在借鉴他的博客上的双方部署教程才能顺利完成,还有HERODING23博主
这里基本是完善的教程,因为是比赛要交的readme文件,博主搞完之后实在看到就想吐,可能存在部分问题,你们参考上面几位优秀博主来看。
如果有不懂的地方请随时打扰博主,感谢大家
本博主使用的是Linux Centos7版本虚拟机,上传文件的时候直接手动上传,你们自己研究
关于docker部署和docker_compose部署在嘉然然博主博客内
本团队使用官方配置的hetero_linear_regression建模文件来进行学习训练
1. 数据处理
1.1 A数据处理
处理文件:
A_usedata.py
- A公司数据
- 训练数据 housing_2_train.csv
- 预测数据 housing_2_eval.csv
1.2 B数据处理
处理文件:
B_usedata.py
- B公司数据
- 训练数据 housing_1_train.csv
- 预测数据 housing_1_eval.csv
2.数据上传
- upload_data.json
在两位IP中都重复操作,将4份csv文件分别上传
{
"file": "/data/projects/fate/python/examples/mydata/housing_2_train.csv",
"table_name": "homo_housing_2_train",
"namespace": "homo_host_housing_train",
"head": 1,
"partition": 16,
"work_mode": 1,
}
{
"file": "/data/projects/fate/python/examples/mydata/housing_2_eval.csv",
"table_name": "homo_housing_2_eval",
"namespace": "homo_host_housing_eval",
"head": 1,
"partition": 16,
"work_mode": 1,
}
{
"file": "/data/projects/fate/python/examples/mydata/housing_1_train.csv",
"table_name": "homo_housing_1_train",
"namespace": "homo_guest_housing_train",
"head": 1,
"partition": 16,
"work_mode": 1,
}
{
"file": "/data/projects/fate/python/examples/mydata/housing_1_eval.csv",
"table_name": "homo_housing_1_eval",
"namespace": "homo_guest_housing_eval",
"head": 1,
"partition": 16,
"work_mode": 1,
}
环境搭建(在A主机上完成)
1.代码镜像包
//本团队采取官方配置的镜像文件,并自行添加数据与相关参数设置
docker load -i fate_1.3.0-images.tar.gz
tar -xzf kubefate-docker-compose.tar.gz
//进行解压
# cd docker-deploy/ //进入docker-deploy目录
# vi parties.conf //编辑parties.conf配置文件
user=root
dir=/data/projects/fate
partylist=(10000 9999) //此处为两个集群的ID
partyiplist=(111.118.16.129 112.168.16.130) //此处写入两个目标机的IP
servingiplist=(112.118.16.129 112.168.16.130) //此处写入两个目标机的IP
exchangeip=
# bash generate_config.sh //生成部署文件
# bash docker_deploy.sh all //执行启动部署集群脚本
//需要输入root密码
# docker exec -it confs-10000_python_1 bash //进入部署机的python容器
# cd /data/projects/fate/python/examples/toy_example //进入测试脚本文件夹
# python run_toy_example.py 10000 9999 1 //运行测试脚本,1代表多机
3.样本对齐
样本对齐旨在不泄露双方数据的前提下,求取双方用户的交集,从而确定模型训练的训练数据集。
FATE提供多方安全的样本对齐算法,算法基于RSA加密算和散列函数,利用FATE建模时,不需要自己实现样本对齐算法,FATE为模型训练提供了样本对齐的接口。
4.模型训练
在FATE框架下,模型训练的本质就是修改dsl.json和conf.json两个文件,
进入/data/projects/fate/python/examples/federatedml-1.x-examples/hetero_linear_regression文件中,该目录下已经有很多定义好的dsl和conf配置文件,修改下面两个文件。
地址可能不太一样,自己去找
- test_hetero_linr_train_job_dsl.json :用来描述任务模块,将任务模块以有向无环图形式组合到一起。
- test_hetero_linr_train_job_conf.json:用来设置各个组件的参数,比如输入模块的数据表名,算法模块的学习率、batch大小、迭代次数等
4.1 修改dsl文件(在B客机中进行操作)
首先查看dsl配置文件,定义了四个组件模块:
- dataio_0:数据I/O组件,用于将本地数据转换为DTable
- intersection_0:样本对齐组件、用于求取双方的数据交集
- hetero_linr_0:纵向线性回归模型组件
- evaluation_0:模型评估组件。如果没有提供测试数据集,则将自动使用训练数据集作为测试数据集
{
"components" : {
"dataio_0": {
"module": "DataIO",
"input": {
"data": {
"data": ["args.train_data"]
}
},
"output": {
"data": ["train"],
"model": ["dataio"]
}
},
"dataio_1": {
"module": "DataIO",
"input": {
"data": {
"data": ["args.eval_data"]
},
"model": ["dataio_0.dataio"]
},
"output": {
"data": ["eval"],
"model": ["dataio"]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": ["dataio_0.train"]
}
},
"output": {
"data": ["train"]
}
},
"intersection_1": {
"module": "Intersection",
"input": {
"data": {
"data": ["dataio_1.eval"]
}
},
"output": {
"data": ["eval"]
}
},
"hetero_linr_0": {
"module": "HeteroLinR",
"input": {
"data": {
"train_data": ["intersection_0.train"],
"eval_data": ["intersection_1.eval"]
}
},
"output": {
"data": ["train"],
"model": ["hetero_linr"]
}
},
"evaluation_0": {
"module": "Evaluation",
"input": {
"data": {
"data": ["hetero_linr_0.train"]
}
}
}
}
}
4.2修改conf
- 修改party ID、数据源、label_name、设置模型参数即可
{
"initiator": {
"role": "guest",
"party_id": 9999
},
"job_parameters": {
"work_mode": 1
},
"role": {
"guest": [9999],
"host": [10000],
"arbiter": [10000]
},
"role_parameters": {
"guest": {
"args": {
"data": {
"train_data": [
{
"name": "homo_housing_2_train",
"namespace": "homo_host_housing_train"
}
],
"eval_data": [
{
"name": "homo_housing_2_eval",
"namespace": "homo_host_housing_eval"
}
]
}
},
"dataio_0": {
"with_label": [true],
"label_name": ["y"],
"label_type": ["float"],
"output_format": ["dense"],
"missing_fill": [true],
"outlier_replace": [false]
},
"dataio_1": {
"with_label": [true],
"label_name": ["y"],
"label_type": ["float"],
"output_format": ["dense"],
"missing_fill": [true],
"outlier_replace": [false]
},
"evaluation_0": {
"eval_type": ["regression"],
"pos_label": [1]
},
"evaluation_1": {
"eval_type": ["regression"],
"pos_label": [1]
}
},
"host": {
"args": {
"data": {
"train_data": [
{
"name": "homo_housing_1_train",
"namespace": "homo_guest_housing_train"
}
],
"eval_data": [
{
"name": "homo_housing_1_eval",
"namespace": "homo_guest_housing_eval"
}
]
}
},
"dataio_0": {
"with_label": [false],
"output_format": ["dense"],
"outlier_replace": [false]
},
"dataio_1": {
"with_label": [false],
"output_format": ["dense"],
"outlier_replace": [false]
},
"evaluation_0": {
"need_run": [false]
},
"evalution_1":{
"need_run":[false]
}
}
},
"algorithm_parameters": {
"hetero_linr_0": {
"penalty": "L2",
"optimizer": "sgd",
"tol": 0.001,
"alpha": 0.01,
"max_iter": 20,
"early_stop": "weight_diff",
"batch_size": -1,
"learning_rate": 0.15,
"decay": 0.0,
"decay_sqrt": false,
"early_stopping_rounds": 1,
"validation_freqs": 5,
"metrics": [
"mean_absolute_error",
"root_mean_squared_error"
],
"use_first_metric_only": false,
"init_param": {"init_method": "zeros"},
"encrypted_mode_calculator_param": {"mode": "fast"}
}
}
}
5.提交任务
python ../../../fate_flow/fate_flow_client.py -f submit_job -d test_hetero_linr_train_job_dsl.json -c test_hetero_linr_train_job_conf.json- 查看结果
5.1 在hetero_linr_0中可查看各项特征权重
就会出现这种
注意:在网页打开

| A方特征 | 权重 |
|---|---|
| x30 | -0.030827 |
| x32 | -0.015357 |
| x31 | -0.008134 |
| x34 | -0.005614 |
| … | … |
| B方特征 | 权重 |
|---|---|
| x8 | 0.03296 |
| x9 | 0.00314 |
| x10 | 0.002589 |
| x0 | -0.00486 |
| … | … |
5.2在evaluation_0中即可查看各项eval值的训练结果
| id | y_prob |
|---|---|
| 0 | 0.9391219 |
| 1 | 0.9795526 |
| 3 | 0.9507830 |
| 4 | 0.9076095 |
| 5 | 0.9236436 |
| 8 | 0.9287811 |