我已经检索了IMDB数据的转储(感谢 http://www.omdbapi.com/和少量捐赠(作为TSV文件(包含1,111,073行(。每条线代表一部电影,它们看起来像这样:
ID imdbID Title Year Rating Runtime Genre Released Director Writer Cast Metacritic imdbRating imdbVotes Poster Plot FullPlot Language Country Awards lastUpdated
1 tt0000001 Carmencita 1894 NOT RATED 1 min Documentary, Short William K.L. Dickson Carmencita 5.8 1100 http://ia.media-imdb.com/images/M/MV5BMjAzNDEwMzk3OV5BMl5BanBnXkFtZTcwOTk4OTM5Ng@@._V1_SX300.jpg Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face. Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face. USA 2015-12-10 01:09:33.043000000
我的目标是可视化电影长度随时间的变化。因此,我需要创建两个数组,一个用于最小/最大值,一个用于每年的平均值(因为Highcharts图表类型"面积图和折线图"需要这种格式(。所以我编写了一个脚本,该脚本适用于一小部分,但在尝试读取整个文件时会意外地抛出错误。
我很清楚流应该能够帮助解决这个问题,但我的专业知识有限,这个小项目实际上是为了帮助我更好地挖掘流......
以下是目前的脚本:
https://gist.github.com/jfix/f79f011ce99d2049613c
如果最好在我的问题中内联显示整个脚本,我显然可以添加它。
这是抛出的错误:
$ node each.js
buffer.js:382
throw new Error('toString failed');
^
Error: toString failed
at Buffer.toString (buffer.js:382:11)
at StringDecoder.write (string_decoder.js:129:21)
at Parser._transform (/Users/jakob/Projects/imdb-film-length/node_modules/csv-parse/lib/index.js:154:26)
at Transform._read (_stream_transform.js:167:10)
at Transform._write (_stream_transform.js:155:12)
at doWrite (_stream_writable.js:292:12)
at writeOrBuffer (_stream_writable.js:278:5)
at Writable.write (_stream_writable.js:207:11)
at /Users/jakob/Projects/imdb-film-length/node_modules/csv-parse/lib/index.js:46:14
at doNTCallback0 (node.js:419:9)
感谢您在正确方向上的任何指示...
我尝试重现您的情况,但仅通过运行就收到相同的错误:
csv(file, {delimiter: tab, relax: true, columns: true}, (err, out) => { });
因此,csv-parse 模块似乎使进程耗尽内存,因为回调分配了大量数组。您可能需要改用 csv-parse 模块的流 api。下面描述了一个示例:http://csv.adaltas.com/parse/examples/