PreparedStatement批量插入/更新数据优化——setObject与set具体类型的效率差异

发布于:2022-11-29 ⋅ 阅读:(162) ⋅ 点赞:(0)

PreparedStatement批量插入/更新数据优化——setObject与set具体类型的效率差异

问题起因

因为项目上需求,我们需要把表XMJBXX_WX的数据清洗一遍后,插入到XMJBXX_WX_1中。原本XMJBXX_WXXMJBXX_WX_1的表结构有些许的不同,但是为了简化问题,突出效率优化,这里就把XMJBXX_WXXMJBXX_WX_1认为是表结构相同的表了。

问题代码

这个代码写起来非常简单,就是一个常规的select * from XMJBXX_WX,然后遍历结果集,将数据分批插入到XMJBXX_WX_1中,代码如下:

public void tryTransferData1(String sourceTable, String tarTable) {
    logger.info("开始清空{}的业务数据", tarTable);
    jdbcTemplate.execute("truncate table " + tarTable);
    logger.info("{}的业务数据清空完毕", tarTable);
    logger.info("开始查询源表{}元数据信息", sourceTable);
    LocalDateTime startTime = LocalDateTime.now();
    String sql = "select COLUMN_NAME as \"columnName\" from USER_TAB_COLUMNS where TABLE_NAME=?";
    List<String> columnList = jdbcTemplate.queryForList(sql, String.class, sourceTable);
    logger.info("{}元数据信息查询完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
    StringBuilder queryBuilder = new StringBuilder();
    queryBuilder.append("select ");
    for (String column : columnList) {
        queryBuilder.append(column).append(" as ").append(column).append(",");
    }
    queryBuilder.setCharAt(queryBuilder.length() - 1, ' ');
    queryBuilder.append("from ").append(sourceTable);
    StringBuilder insertBuilder = new StringBuilder();
    insertBuilder.append("insert into ").append(tarTable).append("(");
    for (String column : columnList) {
        insertBuilder.append(column).append(",");
    }
    insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
    insertBuilder.append(" VALUES (");
    for (String column : columnList) {
        insertBuilder.append("?,");
    }
    insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
    String insertSql = insertBuilder.toString();
    jdbcTemplate.setFetchSize(2000);
    int size = columnList.size();
    startTime = LocalDateTime.now();
    logger.info("开始查询{}的业务数据,sql:[{}]", sourceTable, queryBuilder);
    jdbcTemplate.query(queryBuilder.toString(), new ResultSetExtractor<Object>() {

        List<Map<String, Object>> dataList = new ArrayList<>();
        int count = 0;
        final int batchSize = 1000;
        int insertCount = 1;

        @Override
        public Object extractData(ResultSet rs) throws SQLException, DataAccessException {
            while (rs.next()) {
                if (count == batchSize) {
                    insertData2TarTable();
                }
                Map<String, Object> dataMap = new HashMap<>();
                for (int j = 0; j < size; j++) {
                    dataMap.put(columnList.get(j), rs.getObject(j + 1));
                }
                dataList.add(dataMap);
                count++;
            }
            if (count != 0) {
                insertData2TarTable();
            }
            return null;
        }

        private void insertData2TarTable() {
            LocalDateTime begin = LocalDateTime.now();
            logger.info("开始执行第{}次数据插入", insertCount);
            jdbcTemplate.batchUpdate(insertSql, new BatchPreparedStatementSetter() {
                @Override
                public void setValues(PreparedStatement ps, int i) throws SQLException {
                    Map<String, Object> stringObjectMap = dataList.get(i);
                    for (int j = 1; j <= size; j++) {
                        ps.setObject(j,stringObjectMap.get(columnList.get(j-1)));
                    }
                }

                @Override
                public int getBatchSize() {
                    return count;
                }
            });
            logger.info("第{}次插入完毕,耗时:[{}]", insertCount, Duration.between(begin, LocalDateTime.now()));
            insertCount++;
            dataList = new ArrayList<>();
            count = 0;
        }
    });
    logger.info("{}的业务数据迁移完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
}

日志信息

从日志中可以看到,程序的执行效率远比想想中的慢的多,我设置了1000条批量处理一次,但是第一次插入耗时18秒,第二次26秒,第三次36秒,第四次更慢…
在这里插入图片描述
XMJBXX_WX中一共有7万多条记录,按照这个效率跑下去,估计再有一天也跑不完…

问题排查

应用层面的问题出现问题之后,首先还是要排查一下应用本身是否存在问题

线程dump排查

做了几个线程后,发现整个系统阻塞在了jdbc上,如下图所示:
在这里插入图片描述

GC排查

虽然从线程dump上基本可以确定问题与jdbc有关,但是本着严谨的态度,还是需要排查一下应用系统gc情况。gc情况如下,观察了一会儿发现gc没有问题。
在这里插入图片描述

问题定位

既然初步确定问题与数据有关,那先从数据库开始排查吧。Oracle在这方面做的还是挺好的,我们可以直接用AWR日志分析。以项目上反馈的AWR日志为参考,可以发现我们的insert语句存在3714个版本,这种传递变量的sql怎么会有这么多个版本呢?
在这里插入图片描述
SQL信息如下:

SQL_ID SQL_TEXT
ctx266bmzvm6m insert /+append/ into XMJBXX_J nologging(DATATIME, FLOATORDER, VERSION, ZBTYPE, MD_PROJECT, XMMC, HTJE, ZXWCZJ, KGRQ, JGRQ, YJZSR, YCJYSR, ESCJYSR, YJZCB, YCJYCB, ESCJYCB, ZLR, YCJYLR, ESCJYLR, LRL, YCJYLRL, ESCLRL, ESCJYPJLRL, CHLRL, ZZSSLSR, ZZSSLZYFB, ZZSSLQTCL, ZZSSLLW, ZZSSLGC, ZZSSLSH, ZZSSLJX, ZZSSLQTZJF, ZZSSLXMJF, FJSSL, CBL, SKBL, FKBLZYFB, FKBLLW, FKBLGC, FKBLSH, FKBLQTCL, FKBLJX, FKBLQTZJFY, FKBLXMJF, NBDKLL, WBDKLL, FJXMZT, CCSQ, MD_CURRENCY, FCRQ, ZZSSLCL, JSRQ, LJWGCZMB, JHZJLRGCK, JHZJLRQTLR, JHZJLRHJ, JHZJLCCL, JHZJLCLW, JHZJLCFB, JHZJLCQT, JHZJLCHJ, ZJCJNBDK, ZJCJWBDK, ZJCJHK, QCZJJY, QMZJYQ, LJWGCZ, ZJLRQTSFKBL, ZJLRGCKSFKBL, ZJLRHJSFKBL, LJSKBL, ZJLCCLSFKBL, ZJLCLWSFKBL, ZJLCFBSFKBL, ZJLCHJSFKBL, JYLRL, ZJCJNBDKSFKBL, ZJCJWBDKSFKBL, ZJCJHKSFKBL, ZJCJHJSFKBL, ZJLCQTSFKBL, LJSK, ZJCJHJ_NEW, RECID, RECVER, UNITID, MD_CCSQ, MD_SCENE, MD_STAGE, MD_MGRVER)values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 , :25 , :26 , :27 , :28 , :29 , :30 , :31 , :32 , :33 , :34 , :35 , :36 , :37 , :38 , :39 , :40 , :41 , :42 , :43 , :44 , :45 , :46 , :47 , :48 , :49 , :50 , :51 , :52 , :53 , :54 , :55 , :56 , :57 , :58 , :59 , :60 , :61 , :62 , :63 , :64 , :65 , :66 , :67 , :68 , :69 , :70 , :71 , :72 , :73 , :74 , :75 , :76 , :77 , :78 , :79 , :80 , :81 , :82 , :83 , :84 , :85 , :86 , :87 , :88 , :89 , :90 )

接下来查询一下数据库中存储的变量绑定信息

SELECT NAME, VALUE_STRING, DATATYPE_STRING, COUNT(*)
  FROM v$sql_bind_capture
 WHERE SQL_ID = 'ctx266bmzvm6m'
   AND NAME = ':12'
 GROUP BY NAME, VALUE_STRING, DATATYPE_STRING;

查询结果:
在这里插入图片描述
从查询结果上看,:12变量确实被绑定了多个数据类型。:12变量对应的字段字段在数据库中是NUMBER类型的,变量绑定成VARCHAR2(32)肯定是有问题的。 问题是找到了,但是这个问题是如何引起的呢?

问题追踪

由于这次的需求是要做数据的迁移,所以开发的小哥并不知道每个字段都是什么类型的,方便起见就直接用了PreparedStatement.setObject(index,value)同时又存在很多value为空或value数据类型不准确的情况,导致数据库需要隐式类型转换,所以效率就低了。

问题修改

数据迁移方法,下面的代码中最核心的改动就是oracleDataType.doPreparedStatementSet(ps, j, stringObjectMap.get(columnList.get(index)));

public void tryTransferData(String sourceTable, String tarTable) {
    logger.info("开始清空{}的业务数据", tarTable);
    jdbcTemplate.execute("truncate table " + tarTable);
    logger.info("{}的业务数据清空完毕", tarTable);
    logger.info("开始查询源表{}元数据信息", sourceTable);
    LocalDateTime startTime = LocalDateTime.now();
    String sql = "select DATA_TYPE as \"dataType\",COLUMN_NAME as \"columnName\" from USER_TAB_COLUMNS where TABLE_NAME=?";
    List<Map<String, Object>> tableColumnList = jdbcTemplate.queryForList(sql, sourceTable);
    logger.info("{}元数据信息查询完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
    List<String> columnList = new ArrayList<>(tableColumnList.size());
    List<OracleDataType> dataTypeList = new ArrayList<>(tableColumnList.size());
    for (Map<String, Object> columnMap : tableColumnList) {
        columnList.add((String) columnMap.get("columnName"));
        OracleDataType dataType = OracleDataType.get((String) columnMap.get("dataType"));
        dataTypeList.add(dataType);
    }
    StringBuilder queryBuilder = new StringBuilder();
    queryBuilder.append("select ");
    for (String column : columnList) {
        queryBuilder.append(column).append(" as ").append(column).append(",");
    }
    queryBuilder.setCharAt(queryBuilder.length() - 1, ' ');
    queryBuilder.append("from ").append(sourceTable);
    StringBuilder insertBuilder = new StringBuilder();
    insertBuilder.append("insert into ").append(tarTable).append("(");
    for (String column : columnList) {
        insertBuilder.append(column).append(",");
    }
    insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
    insertBuilder.append(" VALUES (");
    for (String column : columnList) {
        insertBuilder.append("?,");
    }
    insertBuilder.setCharAt(insertBuilder.length() - 1, ')');
    String insertSql = insertBuilder.toString();
    jdbcTemplate.setFetchSize(2000);
    int size = columnList.size();
    startTime = LocalDateTime.now();
    logger.info("开始查询{}的业务数据,sql:[{}]", sourceTable, queryBuilder);
    jdbcTemplate.query(queryBuilder.toString(), new ResultSetExtractor<Object>() {

        List<Map<String, Object>> dataList = new ArrayList<>();
        int count = 0;
        final int batchSize = 1000;
        int insertCount = 1;

        @Override
        public Object extractData(ResultSet rs) throws SQLException, DataAccessException {
            while (rs.next()) {
                if (count == batchSize) {
                    insertData2TarTable();
                }
                Map<String, Object> dataMap = new HashMap<>();
                for (int j = 0; j < size; j++) {
                    dataMap.put(columnList.get(j), rs.getObject(j + 1));
                }
                dataList.add(dataMap);
                count++;
            }
            if (count != 0) {
                insertData2TarTable();
            }
            return null;
        }

        private void insertData2TarTable() {
            LocalDateTime begin = LocalDateTime.now();
            logger.info("开始执行第{}次数据插入", insertCount);
            jdbcTemplate.batchUpdate(insertSql, new BatchPreparedStatementSetter() {
                @Override
                public void setValues(PreparedStatement ps, int i) throws SQLException {
                    Map<String, Object> stringObjectMap = dataList.get(i);
                    for (int j = 1; j <= size; j++) {
                        int index = j - 1;
                        OracleDataType oracleDataType = dataTypeList.get(index);
                        //核心改动,将setObject替换为set具体的类型
                        oracleDataType.doPreparedStatementSet(ps, j, stringObjectMap.get(columnList.get(index)));
                    }
                }

                @Override
                public int getBatchSize() {
                    return count;
                }
            });
            logger.info("第{}次插入完毕,耗时:[{}]", insertCount, Duration.between(begin, LocalDateTime.now()));
            insertCount++;
            dataList = new ArrayList<>();
            count = 0;
        }
    });
    logger.info("{}的业务数据迁移完毕,耗时:[{}]", sourceTable, Duration.between(startTime, LocalDateTime.now()));
}

数据库字段类型映射

public enum OracleDataType {
    TIMESTAMP_3 {
        @Override
        public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
            if (value == null) {
                ps.setTimestamp(index, null);
                return;
            }
            ps.setTimestamp(index, ((TIMESTAMP) value).timestampValue());
        }
    },
    NUMBER {
        @Override
        public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
            ps.setBigDecimal(index, (BigDecimal) value);
        }
    },
    NVARCHAR2 {
        @Override
        public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
            ps.setString(index, (String) value);
        }
    },
    RAW {
        @Override
        public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
            ps.setBytes(index, (byte[]) value);
        }
    };

    public void doPreparedStatementSet(PreparedStatement ps, int index, Object value) throws SQLException {
    }

    ;

    public static OracleDataType get(String value) {
        switch (value) {
            case "TIMESTAMP(3)":
                return TIMESTAMP_3;
            case "NUMBER":
                return NUMBER;
            case "NVARCHAR2":
                return NVARCHAR2;
            case "RAW":
                return RAW;
        }
        throw new RuntimeException("需要支持的数据类型【" + value + "】");
    }
}

问题修改后的日志

从修改后的日志看,数据插入的效率明显的提高了
在这里插入图片描述

总结

使用PreparedStatement做数据更新时,尽可能不要使用setObject方法,数据库中字段是什么类型,就set什么。还是不要太省事儿了。-_-||

本文含有隐藏内容,请 开通VIP 后查看