我试图将一个相对较大的CSV文件加载到DynamoDB表中,该文件大约有20,000,000行。然而,在大约1,000,000行之后,我得到了内存转储
<--- Last few GCs --->
136289 ms: Scavenge 1397.5 (1457.9) -> 1397.5 (1457.9) MB, 0.3 / 0 ms (+ 0.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
137127 ms: Mark-sweep 1397.5 (1457.9) -> 1397.5 (1457.9) MB, 841.8 / 0 ms (+ 0.0 ms in 1 steps since start of marking, biggest step 0.0 ms) [last resort gc].
137989 ms: Mark-sweep 1397.5 (1457.9) -> 1397.5 (1457.9) MB, 858.6 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0000009B9BAB4639 <JS Object>
1: stringify [native json.js:157] [pc=000003139D3AB8C4] (this=0000009B9BAAE771 <a JSON with map 0000004A38909B69>,u=0000009B9BAD8B09 <an Object with map 000001D75FD60619>,v=0000009B9BA041B9 <undefined>,I=0000009B9BA041B9 <undefined>)
2: arguments adaptor frame: 1->3
3: buildRequest [c:WorkspaceArchivenode_modulesaws-sdklibprotocoljson.js:~5] [pc=000003139D345857] (this=0000...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
这是我的代码。有什么我能做的吗?
function processFile(fileName)
{
var docClient = new AWS.DynamoDB.DocumentClient();
var lineReader = readline.createInterface({
input: fs.createReadStream(fileName)
});
var batchRecords = [];
lineReader.on('line', function (line) {
var split = line.split(',');
var obj = {
field1: split[0],
field2: split[1],
field3: split[2],
field4: split[3],
field5: split[4],
field6: split[5]
}
batchRecords.push(obj);
if (batchRecords.length == 25) {
var putRequests = batchRecords.map((e) => {
return {
PutRequest: {
Item: e
}
}
});
var params = {
RequestItems: {
"MyTable": putRequests
}
};
// Comment out this line and runs through ok
docClient.batchWrite(params, function (err, data) {
if (err) console.log(err, err.stack);
});
batchRecords = [];
}
});
lineReader.on('close', function() {
console.log('Done');
});
}
您正确地读取了文件,一行一行地没有试图在内存中插入20M行,因此这里没有内存问题。
但是这里,正如你指出的:
// Comment out this line and runs through ok
docClient.batchWrite(params, function (err, data) {
if (err) console.log(err, err.stack);
});
你在回调中引用data
,但没有使用它。Javascript GC不喜欢这样。试着删除它,看看是否有什么不同:
// Comment out this line and runs through ok
docClient.batchWrite(params, function (err) {
if (err) console.log(err, err.stack);
});
[编辑]
好的,所以我的第二个猜测将与var batchRecords = [];
有关,因为它在第一个回调之外声明。一开始尽量不要批处理,这不是最优的,但这样会减少代码,从而增加发现泄漏的机会。
我最后的猜测是内部AWS.DynamoDB.DocumentClient
正在泄漏。
您可能也不想关心泄漏,并使用:
node --max-old-space-size=8192 script.js //do not limit to default 1.4Gb but 8gb.
我很关心,但我不喜欢这样做,但是,嘿,我不知道你可能有什么限制
您是否提高了预置的写吞吐量?