postgreGroovy Sql WithBatch 在数据库中缺少记录



我正在使用Groovy Sql.withBatch来处理CSV文件并将所有数据加载到我的Postgres DB中。

这是我的方法:

def processCSV() {
    def logger = Logger.getLogger('groovy.sql')
    logger.level = Level.FINE
    logger.addHandler(new ConsoleHandler(level: Level.FINE))
    def fileName = "file.csv"
    def resource = this.getClass().getResource( '/csv/' + fileName )
    File file = new File(resource.path)
    String year = '2016'
    char separator = ','
    def lines = CSV
            .separator(separator)
            .skipLines(1)
            .quote(CSVParser.DEFAULT_QUOTE_CHARACTER)
            .escape(CSVParser.DEFAULT_ESCAPE_CHARACTER)
            .charset('UTF-8')
            .create()
            .reader(file)
            .readAll()
    def totalLines = lines.size()
    Sql sql = getDatabaseInstance()
    println("Delete existing rows for " + year + " if exists")
    String dQuery = "DELETE FROM table1 WHERE year = ?"
    sql.execute(dQuery, [year])
    def statement = 'INSERT INTO table1 (column1, column2, column3, coulmn4, year) VALUES (?, ?, ?, ?, ?)'
    println("Total lines in the CSV files: " + totalLines)
    def batches = []
    sql.withBatch(BATCH_SIZE, statement) { ps ->
        lines.each { fields ->
            String coulmn1 = fields[0]
            String coulmn2 = fields[1]
            String column3 = fields[2]
            String column4 = fields[3]
            def params = [column1, coulmn2, column3, column4, year]
            def batch = ['params': params, 'error': false]
            try {
                ps.addBatch(params)
            }
            catch (all) {
                batch['error'] = true
                throw all
            }
            batches << batch
        }
    }
    def recordsAddedInDB = sql.firstRow("SELECT count(*) FROM " + tableName + " WHERE year = ?", year)[0]
    sql.close()
    println("")
    println("Processed lines: " + line)
    println("Batches: " + batches.size())
    println("Batches in error: " + batches.findAll{ it.error }.size())
    println("Record in DB for " + year + ": " + recordsAddedInDB)
}

CSV 文件中的行(不包括标题行(为 23758。此方法的输出如下所示:

Delete existing rows for 2016 if exists
Total lines in the CSV files: 23758
Processed lines: 23758
Batches: 23758
Batches in error: 0
Record in DB for 2016 year: 23580

如果我启用日志记录,BATCH_SIZE为 500,我可以看到:

  • 47 次,句子"成功执行批处理,包含 500 个命令">
  • 1 次句子"使用 258 个命令成功执行批处理">

这意味着已处理 23758 插入语句。

有人知道为什么数据库中的行数比处理的行少吗?

已解决。INSERT 语句有一个子查询,当子查询不返回值时,将忽略 INSERT 语句。

最新更新