如何对自定义RecordReader和InputFormat类进行单元测试

我已经开发了一个map reduce程序。我已经编写了自定义的RecordReader和InputFormat类。

我使用MR Unit和Mockito进行映射器和归约器的单元测试。

我想知道如何对自定义RecordReader和InputFormat类进行单元测试？测试这些类的最佳方式是什么？

感谢user7610

来自答案的示例代码的编译和测试版本

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.TaskAttemptID;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
import org.apache.hadoop.util.ReflectionUtils;
import java.io.File;
Configuration conf = new Configuration(false);
conf.set("fs.default.name", "file:///");
File testFile = new File("path/to/file");
Path path = new Path(testFile.getAbsoluteFile().toURI());
FileSplit split = new FileSplit(path, 0, testFile.length(), null);
InputFormat inputFormat = ReflectionUtils.newInstance(MyInputFormat.class, conf);
TaskAttemptContext context = new TaskAttemptContextImpl(conf, new TaskAttemptID());
RecordReader reader = inputFormat.createRecordReader(split, context);
reader.initialize(split, context);

您需要一个可用的测试文件（我假设您的输入格式扩展了FileInputFormat）。一旦您有了这个，您就可以将配置对象配置为使用LocalFileSystem（fs.default.name或fs.defaultFS设置为file:///）。最后，您需要定义一个FileSplit，其中包含flie（文件的一部分）的路径、偏移量和长度。

// DISCLAIMER: untested or compiled
Configuration conf = new Configuration(false);
conf.set("fs.default.name", "file:///");
File testFile = new File("path/to/file");
FileSplit split = new FileSplit(
       testFile.getAbsoluteFile().toURI().toString(), 0, 
       testFile.getLength(), null); 
MyInputFormat inputFormat = ReflectionUtils.newInstance(Myinputformat.class, conf);
RecordReader reader = inputFormat.createRecordReader(split, 
       new TaskAttemptContext(conf, new TaskAttemptID()));

现在，您可以断言从读取器返回的记录与您期望的记录相匹配。您还应该测试（如果您的文件格式支持）更改拆分的偏移量和长度，以及创建文件的压缩版本。

相关内容

最新更新

热门标签：