原来的集群没有高可用而且配置很低,所有准备将一个3分片1副本集群迁到1分片2副本的高配置集群,数据量比较大。
虽然官方已经标记clickhouse-copier已经过时,但为了方便和高效还是用了copier,效果还挺好
以下是使用步骤:
- 服务器上下载copier解压后,首先需要准备keeper.xml文件,目标端使用的是clickhouse-keeper,这里配置还是zookeeper但是不影响
<!-- config.xml -->
<clickhouse>
<logger>
<level>trace</level>
<size>100M</size>
<log>./log/clickhouse-copier/copier/log.log</log>
<errorlog>./log/clickhouse-copier/copier/log.err.log</errorlog>
<count>3</count>
</logger>
# 填入对应的 keeper的ip地址
<zookeeper>
<node index="1">
<host>172.23.3.44</host>
<port>9181</port>
</node>
<node index="2">
<host>172.23.3.55</host>
<port>9181</port>
</node>
<node index="3">
<host>172.23.3.66</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>
- 准备迁移任务文件task_migrate_all.xml,可以将需要迁移的表放在一个文件,或者分成多个文件都可,但保证需要迁移的表在源集群所有节点都有,可以建空表来跳过检查,否则会报错
<clickhouse>
<remote_servers>
##源集群地址
<source_cluster>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>172.23.3.11</host>
<port>9000</port>
<user>user_name</user>
<password>pwd</password>
</replica>
</shard>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>172.23.3.12</host>
<port>9000</port>
<user>user_name</user>
<password>pwd</password>
</replica>
</shard>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>172.23.3.13</host>
<port>9000</port>
<user>user_name</user>
<password>pwd</password>
</replica>
</shard>
</source_cluster>
#目标集群地址
<destination_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>172.23.3.44</host>
<port>9000</port>
<user>user_name</user>
<password>pwd</password>
</replica>
<replica>
<host>172.23.3.55</host>
<port>9000</port>
<user>user_name</user>
<password>pwd</password>
</replica>
</shard>
</destination_cluster>
</remote_servers>
# 并发线程
<max_workers>12</max_workers>
#对源库只读
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<!-- Setting used to insert (push) data to destination cluster tables -->
<settings_push>
<readonly>0</readonly>
</settings_push>
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
# 需要迁移的表,可以tables里面可以写多张表
<tables>
<table_sms_send>
<cluster_pull>source_cluster</cluster_pull>
<database_pull>test_qwe</database_pull>
<table_pull>sms_send</table_pull>
<cluster_push>destination_cluster</cluster_push>
<database_push>test_qwe</database_push>
<table_push>sms_send</table_push>
# 因为是多副本,需要使用复制表,否则只有单节点有数据
<engine>
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}')
PARTITION BY toYYYYMM(create_time)
ORDER BY (id, phone_number, state, create_time)
</engine>
# 1分片不需要分片键,这里设置为1
<sharding_key>1</sharding_key>
</table_sms_send>
<table_use_car>
<cluster_pull>source_cluster</cluster_pull>
<database_pull>test_qwe</database_pull>
<table_pull>use_car</table_pull>
<cluster_push>destination_cluster</cluster_push>
<database_push>testqwe</database_push>
<table_push>use_car</table_push>
<engine>
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}', add_time)
ORDER BY (id, detail_id)
</engine>
<sharding_key>1</sharding_key>
</table_use_car>
</tables>
</clickhouse>
- 在clickhouse-copier解压目录执行命令开始迁移 .
# 这里task-path表示keeper存储的迁移任务路径
/clickhouse-copier --config keeper.xml --task-path /clickhouse/copier_task/mig_test --task-file=task_migrate_all.xml