在单个映射函数中详细说明多条线

我正在研究hadoop，我希望每个map函数都能在多行上工作。我发现我可以使用属性mapreduce.input.lineinputformat.linespermap，但如果我了解它，我可以为单个映射器指定行数，而不是为每个映射函数指定行数。我该怎么做？提前谢谢。

1）您必须编写自定义文本格式。

2）您将不得不为此创建自己的自定义记录读取器，并在其中实现逻辑。

You will extend from  TextInputFormat class to create your own NLinesInputFormat .
You will also create your own RecordReader class called NLinesRecordReader where you will implement the logic of feeding 3 lines/records at a time.
You will make a change in our driver program to use our new NLinesInputFormat class.
please follow the link for complete details :

请点击以下链接了解详细方法：http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/

相关内容

最新更新

热门标签：