使用NodeJs导入AWS DynamoDB会导致内存泄漏



我试图将一个相对较大的CSV文件加载到DynamoDB表中,该文件大约有20,000,000行。然而,在大约1,000,000行之后,我得到了内存转储

<--- Last few GCs --->
  136289 ms: Scavenge 1397.5 (1457.9) -> 1397.5 (1457.9) MB, 0.3 / 0 ms (+ 0.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  137127 ms: Mark-sweep 1397.5 (1457.9) -> 1397.5 (1457.9) MB, 841.8 / 0 ms (+ 0.0 ms in 1 steps since start of marking, biggest step 0.0 ms) [last resort gc].
  137989 ms: Mark-sweep 1397.5 (1457.9) -> 1397.5 (1457.9) MB, 858.6 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0000009B9BAB4639 <JS Object>
    1: stringify [native json.js:157] [pc=000003139D3AB8C4] (this=0000009B9BAAE771 <a JSON with map 0000004A38909B69>,u=0000009B9BAD8B09 <an Object with map 000001D75FD60619>,v=0000009B9BA041B9 <undefined>,I=0000009B9BA041B9 <undefined>)
    2: arguments adaptor frame: 1->3
    3: buildRequest [c:WorkspaceArchivenode_modulesaws-sdklibprotocoljson.js:~5] [pc=000003139D345857] (this=0000...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

这是我的代码。有什么我能做的吗?

function processFile(fileName)
{
  var docClient = new AWS.DynamoDB.DocumentClient();
  var lineReader = readline.createInterface({
    input: fs.createReadStream(fileName)
  });
  var batchRecords = [];
  lineReader.on('line', function (line) {
    var split = line.split(',');
    var obj = {
      field1: split[0],
      field2: split[1],
      field3: split[2],
      field4: split[3],
      field5: split[4],
      field6: split[5]
    }
    batchRecords.push(obj);
    if (batchRecords.length == 25) {
      var putRequests = batchRecords.map((e) => {
        return {
          PutRequest: {
            Item: e
          }
        }
      });
      var params = {
        RequestItems: {
          "MyTable": putRequests
        }
      };
      // Comment out this line and runs through ok
      docClient.batchWrite(params, function (err, data) {
        if (err) console.log(err, err.stack);
      });
      batchRecords = [];
    }
  });
  lineReader.on('close', function() {
    console.log('Done');
  });
}

您正确地读取了文件,一行一行地没有试图在内存中插入20M行,因此这里没有内存问题。

但是这里,正如你指出的:

 // Comment out this line and runs through ok
      docClient.batchWrite(params, function (err, data) {
        if (err) console.log(err, err.stack);
      });

你在回调中引用data,但没有使用它。Javascript GC不喜欢这样。试着删除它,看看是否有什么不同:

// Comment out this line and runs through ok
          docClient.batchWrite(params, function (err) {
            if (err) console.log(err, err.stack);
          });

[编辑]

好的,所以我的第二个猜测将与var batchRecords = [];有关,因为它在第一个回调之外声明。一开始尽量不要批处理,这不是最优的,但这样会减少代码,从而增加发现泄漏的机会。

我最后的猜测是内部AWS.DynamoDB.DocumentClient正在泄漏。

您可能也不想关心泄漏,并使用:

node --max-old-space-size=8192 script.js //do not limit to default 1.4Gb but 8gb.

我很关心,但我不喜欢这样做,但是,嘿,我不知道你可能有什么限制

您是否提高了预置的写吞吐量?

相关内容

  • 没有找到相关文章

最新更新