PreparedStatement批量插入/更新数据优化——setObject与set具体类型的效率差异
问题起因
因为项目上需求,我们需要把表XMJBXX_WX的数据清洗一遍后,插入到XMJBXX_WX_1中。原本XMJBXX_WX与XMJBXX_WX_1的表结构有些许的不同,但是为了简化问题,突出效率优化,这里就把XMJBXX_WX和XMJBXX_WX_1认为是表结构相同的表了。
问题代码
这个代码写起来非常简单,就是一个常规的select * from XMJBXX_WX
,然后遍历结果集,将数据分批插入到XMJBXX_WX_1中,代码如下:
public void tryTransferData1(String sourceTable, String tarTable) {
logger.info("开始清空{}的业务数据", tarTable);
jdbcTemplate.execute("truncate table " + tarTable);
logger.info("{}的业务数据清空完毕", tarTable);
logger.info("开始查询源表{}元数据信息", sourceTable);
LocalDateTime startTime = LocalDateTime.now();
String sql = "select COLUMN_NAME as \"columnName\" from USER_TAB_COLUMNS where TABLE_NAME=?";
List<String> columnList = jdbcTemplate.queryForList(sql, String.class, sourceTable);
logger.info("{}元数据信息查询完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
StringBuilder queryBuilder = new StringBuilder();
queryBuilder.append("select ");
for (String column : columnList) {
queryBuilder.append(column).append(" as ").append(column).append(",");
}
queryBuilder.setCharAt(queryBuilder.length() - 1, ' ');
queryBuilder.append("from ").append(sourceTable);
StringBuilder insertBuilder = new StringBuilder();
insertBuilder.append("insert into ").append(tarTable).append("(");
for (String column : columnList) {
insertBuilder.append(column).append(",");
}
insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
insertBuilder.append(" VALUES (");
for (String column : columnList) {
insertBuilder.append("?,");
}
insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
String insertSql = insertBuilder.toString();
jdbcTemplate.setFetchSize(2000);
int size = columnList.size();
startTime = LocalDateTime.now();
logger.info("开始查询{}的业务数据,sql:[{}]", sourceTable, queryBuilder);
jdbcTemplate.query(queryBuilder.toString(), new ResultSetExtractor<Object>() {
List<Map<String, Object>> dataList = new ArrayList<>();
int count = 0;
final int batchSize = 1000;
int insertCount = 1;
@Override
public Object extractData(ResultSet rs) throws SQLException, DataAccessException {
while (rs.next()) {
if (count == batchSize) {
insertData2TarTable();
}
Map<String, Object> dataMap = new HashMap<>();
for (int j = 0; j < size; j++) {
dataMap.put(columnList.get(j), rs.getObject(j + 1));
}
dataList.add(dataMap);
count++;
}
if (count != 0) {
insertData2TarTable();
}
return null;
}
private void insertData2TarTable() {
LocalDateTime begin = LocalDateTime.now();
logger.info("开始执行第{}次数据插入", insertCount);
jdbcTemplate.batchUpdate(insertSql, new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Map<String, Object> stringObjectMap = dataList.get(i);
for (int j = 1; j <= size; j++) {
ps.setObject(j,stringObjectMap.get(columnList.get(j-1)));
}
}
@Override
public int getBatchSize() {
return count;
}
});
logger.info("第{}次插入完毕,耗时:[{}]", insertCount, Duration.between(begin, LocalDateTime.now()));
insertCount++;
dataList = new ArrayList<>();
count = 0;
}
});
logger.info("{}的业务数据迁移完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
}
日志信息
从日志中可以看到,程序的执行效率远比想想中的慢的多,我设置了1000条批量处理一次,但是第一次插入耗时18秒,第二次26秒,第三次36秒,第四次更慢…
在XMJBXX_WX中一共有7万多条记录,按照这个效率跑下去,估计再有一天也跑不完…
问题排查
应用层面的问题出现问题之后,首先还是要排查一下应用本身是否存在问题
线程dump排查
做了几个线程后,发现整个系统阻塞在了jdbc上,如下图所示:
GC排查
虽然从线程dump上基本可以确定问题与jdbc有关,但是本着严谨的态度,还是需要排查一下应用系统gc情况。gc情况如下,观察了一会儿发现gc没有问题。
问题定位
既然初步确定问题与数据有关,那先从数据库开始排查吧。Oracle在这方面做的还是挺好的,我们可以直接用AWR日志分析。以项目上反馈的AWR日志为参考,可以发现我们的insert语句存在3714个版本,这种传递变量的sql怎么会有这么多个版本呢?
SQL信息如下:
SQL_ID | SQL_TEXT |
---|---|
ctx266bmzvm6m | insert /+append/ into XMJBXX_J nologging(DATATIME, FLOATORDER, VERSION, ZBTYPE, MD_PROJECT, XMMC, HTJE, ZXWCZJ, KGRQ, JGRQ, YJZSR, YCJYSR, ESCJYSR, YJZCB, YCJYCB, ESCJYCB, ZLR, YCJYLR, ESCJYLR, LRL, YCJYLRL, ESCLRL, ESCJYPJLRL, CHLRL, ZZSSLSR, ZZSSLZYFB, ZZSSLQTCL, ZZSSLLW, ZZSSLGC, ZZSSLSH, ZZSSLJX, ZZSSLQTZJF, ZZSSLXMJF, FJSSL, CBL, SKBL, FKBLZYFB, FKBLLW, FKBLGC, FKBLSH, FKBLQTCL, FKBLJX, FKBLQTZJFY, FKBLXMJF, NBDKLL, WBDKLL, FJXMZT, CCSQ, MD_CURRENCY, FCRQ, ZZSSLCL, JSRQ, LJWGCZMB, JHZJLRGCK, JHZJLRQTLR, JHZJLRHJ, JHZJLCCL, JHZJLCLW, JHZJLCFB, JHZJLCQT, JHZJLCHJ, ZJCJNBDK, ZJCJWBDK, ZJCJHK, QCZJJY, QMZJYQ, LJWGCZ, ZJLRQTSFKBL, ZJLRGCKSFKBL, ZJLRHJSFKBL, LJSKBL, ZJLCCLSFKBL, ZJLCLWSFKBL, ZJLCFBSFKBL, ZJLCHJSFKBL, JYLRL, ZJCJNBDKSFKBL, ZJCJWBDKSFKBL, ZJCJHKSFKBL, ZJCJHJSFKBL, ZJLCQTSFKBL, LJSK, ZJCJHJ_NEW, RECID, RECVER, UNITID, MD_CCSQ, MD_SCENE, MD_STAGE, MD_MGRVER)values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 , :25 , :26 , :27 , :28 , :29 , :30 , :31 , :32 , :33 , :34 , :35 , :36 , :37 , :38 , :39 , :40 , :41 , :42 , :43 , :44 , :45 , :46 , :47 , :48 , :49 , :50 , :51 , :52 , :53 , :54 , :55 , :56 , :57 , :58 , :59 , :60 , :61 , :62 , :63 , :64 , :65 , :66 , :67 , :68 , :69 , :70 , :71 , :72 , :73 , :74 , :75 , :76 , :77 , :78 , :79 , :80 , :81 , :82 , :83 , :84 , :85 , :86 , :87 , :88 , :89 , :90 ) |
接下来查询一下数据库中存储的变量绑定信息
SELECT NAME, VALUE_STRING, DATATYPE_STRING, COUNT(*)
FROM v$sql_bind_capture
WHERE SQL_ID = 'ctx266bmzvm6m'
AND NAME = ':12'
GROUP BY NAME, VALUE_STRING, DATATYPE_STRING;
查询结果:
从查询结果上看,:12
变量确实被绑定了多个数据类型。:12
变量对应的字段字段在数据库中是NUMBER
类型的,变量绑定成VARCHAR2(32)
肯定是有问题的。 问题是找到了,但是这个问题是如何引起的呢?
问题追踪
由于这次的需求是要做数据的迁移,所以开发的小哥并不知道每个字段都是什么类型的,方便起见就直接用了PreparedStatement.setObject(index,value)
,同时又存在很多value为空或value数据类型不准确的情况,导致数据库需要隐式类型转换,所以效率就低了。
问题修改
数据迁移方法,下面的代码中最核心的改动就是oracleDataType.doPreparedStatementSet(ps, j, stringObjectMap.get(columnList.get(index)));
public void tryTransferData(String sourceTable, String tarTable) {
logger.info("开始清空{}的业务数据", tarTable);
jdbcTemplate.execute("truncate table " + tarTable);
logger.info("{}的业务数据清空完毕", tarTable);
logger.info("开始查询源表{}元数据信息", sourceTable);
LocalDateTime startTime = LocalDateTime.now();
String sql = "select DATA_TYPE as \"dataType\",COLUMN_NAME as \"columnName\" from USER_TAB_COLUMNS where TABLE_NAME=?";
List<Map<String, Object>> tableColumnList = jdbcTemplate.queryForList(sql, sourceTable);
logger.info("{}元数据信息查询完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
List<String> columnList = new ArrayList<>(tableColumnList.size());
List<OracleDataType> dataTypeList = new ArrayList<>(tableColumnList.size());
for (Map<String, Object> columnMap : tableColumnList) {
columnList.add((String) columnMap.get("columnName"));
OracleDataType dataType = OracleDataType.get((String) columnMap.get("dataType"));
dataTypeList.add(dataType);
}
StringBuilder queryBuilder = new StringBuilder();
queryBuilder.append("select ");
for (String column : columnList) {
queryBuilder.append(column).append(" as ").append(column).append(",");
}
queryBuilder.setCharAt(queryBuilder.length() - 1, ' ');
queryBuilder.append("from ").append(sourceTable);
StringBuilder insertBuilder = new StringBuilder();
insertBuilder.append("insert into ").append(tarTable).append("(");
for (String column : columnList) {
insertBuilder.append(column).append(",");
}
insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
insertBuilder.append(" VALUES (");
for (String column : columnList) {
insertBuilder.append("?,");
}
insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
String insertSql = insertBuilder.toString();
jdbcTemplate.setFetchSize(2000);
int size = columnList.size();
startTime = LocalDateTime.now();
logger.info("开始查询{}的业务数据,sql:[{}]", sourceTable, queryBuilder);
jdbcTemplate.query(queryBuilder.toString(), new ResultSetExtractor<Object>() {
List<Map<String, Object>> dataList = new ArrayList<>();
int count = 0;
final int batchSize = 1000;
int insertCount = 1;
@Override
public Object extractData(ResultSet rs) throws SQLException, DataAccessException {
while (rs.next()) {
if (count == batchSize) {
insertData2TarTable();
}
Map<String, Object> dataMap = new HashMap<>();
for (int j = 0; j < size; j++) {
dataMap.put(columnList.get(j), rs.getObject(j + 1));
}
dataList.add(dataMap);
count++;
}
if (count != 0) {
insertData2TarTable();
}
return null;
}
private void insertData2TarTable() {
LocalDateTime begin = LocalDateTime.now();
logger.info("开始执行第{}次数据插入", insertCount);
jdbcTemplate.batchUpdate(insertSql, new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Map<String, Object> stringObjectMap = dataList.get(i);
for (int j = 1; j <= size; j++) {
int index = j - 1;
OracleDataType oracleDataType = dataTypeList.get(index);
//核心改动,将setObject替换为set具体的类型
oracleDataType.doPreparedStatementSet(ps, j, stringObjectMap.get(columnList.get(index)));
}
}
@Override
public int getBatchSize() {
return count;
}
});
logger.info("第{}次插入完毕,耗时:[{}]", insertCount, Duration.between(begin, LocalDateTime.now()));
insertCount++;
dataList = new ArrayList<>();
count = 0;
}
});
logger.info("{}的业务数据迁移完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
}
数据库字段类型映射
public enum OracleDataType {
TIMESTAMP_3 {
@Override
public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
if (value == null) {
ps.setTimestamp(index, null);
return;
}
ps.setTimestamp(index, ((TIMESTAMP) value).timestampValue());
}
},
NUMBER {
@Override
public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
ps.setBigDecimal(index, (BigDecimal) value);
}
},
NVARCHAR2 {
@Override
public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
ps.setString(index, (String) value);
}
},
RAW {
@Override
public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
ps.setBytes(index, (byte[]) value);
}
};
public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
}
;
public static OracleDataType get(String value) {
switch (value) {
case "TIMESTAMP(3)":
return TIMESTAMP_3;
case "NUMBER":
return NUMBER;
case "NVARCHAR2":
return NVARCHAR2;
case "RAW":
return RAW;
}
throw new RuntimeException("需要支持的数据类型【" + value + "】");
}
}
问题修改后的日志
从修改后的日志看,数据插入的效率明显的提高了
总结
使用PreparedStatement
做数据更新时,尽可能不要使用setObject
方法,数据库中字段是什么类型,就set什么。还是不要太省事儿了。-_-||