如何在Nodejs中批量读取大的csv文件

我有一个csv文件，其中包含超过50万条记录。csv的字段为

名称
年龄
分支

在不将大量数据加载到内存的情况下，我需要处理文件中的所有记录。需要读取少量记录，将它们插入到集合和操作中，然后继续读取剩余的记录。由于我是新手，无法理解它是如何工作的。如果我尝试打印批次，它会打印缓冲的数据，下面的代码能满足我的要求吗？有了这个缓冲值，我如何才能获得csv记录&插入、操作文件数据。

var stream = fs.createReadStream(csvFilePath)
.pipe(csv())
.on('data',(data) => {
batch.push(data)
counter ++;
if(counter == 100){

stream.pause()
setTimeout(() => {
console.log("batch in ",data)
counter = 0;
batch = []
stream.resume()},5000)
}
})
.on('error',(e) => {
console.log("er ",e);
})
.on('end',() => {
console.log("end");
})

我已经为您编写了一些如何使用流的示例代码。您基本上创建了一个流并继续使用它的块。块是buffer类型的对象。要将其作为文本处理，请调用toString()。

没有太多时间向你解释，但这些评论应该会有所帮助。

还要考虑使用模块，因为csv解析已经做了很多工作。希望这能有所帮助>

import * as fs from 'fs'
// end oof line delimiter, system specific.
import { EOL } from 'os'
// the delimiter used in csv
var delimiter = ','
// add your own implementttaion of parsing a portion of  the text here.
const parseChunk = (text,  index) => {
// first chunk, the header is included here. 
if(index === 0) {
// The first row will be the header. So take it
var headerLine = text.substring(0, text.indexOf(EOL))

// remove the header from the text for further processing.
// also replace the  new line character..
text = text.replace(headerLine+EOL, '')
// Do something with header here..

}

// Now you have a part of the file to process without headers.
// The csv parse function you need to figure out yourself. Best
// is to use some module for that. there are plenty of edge cases
// when parsing csv.
// custom csv parser here =>h ttps://stackoverflow.com/questions/1293147/example-javascript-code-to-parse-csv-data
// if the csv is well formatted it could be enough to use  this
var lines = text.split(EOL)

for(var line of lines) {
var values = line.split(delimiter)
console.log('liine  received', values)
// StoreToDb(values)
}
}
// create the stream
const stream = fs.createReadStream('file.csv')
// variable to count the  chunks  for knowing if header is inckuded..
var chunkCount = 0
// handle data event of stream
stream.on('data', chunk => {

// the stream sends you a Buffer
// to have it as text, convert it to string
const text = chunk.toString()
// Note that chunks will be a fixed size
// but mostly consist of multiple lines,
parseChunk(text, chunkCount)
// increment the count.
chunkCount++
})
stream.on('end', () => {
console.log('parsing finished')
})
stream.on('error', (err) => {
// error, handle properly here, maybe rollback changess already made to db
// and parse again. You can may also use the chunkCount to start the parsing
// again and omit first x chunks, so u can restsart at given point
console.log('parsing error ', err)
})

相关内容

最新更新

热门标签：