我正在研究hadoop,我希望每个map函数都能在多行上工作。我发现我可以使用属性mapreduce.input.lineinputformat.linespermap,但如果我了解它,我可以为单个映射器指定行数,而不是为每个映射函数指定行数。我该怎么做?提前谢谢。
1)您必须编写自定义文本格式。
2) 您将不得不为此创建自己的自定义记录读取器,并在其中实现逻辑。
You will extend from TextInputFormat class to create your own NLinesInputFormat .
You will also create your own RecordReader class called NLinesRecordReader where you will implement the logic of feeding 3 lines/records at a time.
You will make a change in our driver program to use our new NLinesInputFormat class.
please follow the link for complete details :
请点击以下链接了解详细方法:http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/