我写了一个仅映射的Hadoop作业,其中我使用了MultipleOutputs概念。这里的问题是,我想用 MRUnit 测试这段代码。我没有看到任何用于多输出测试的工作示例。
我的映射器代码将是这样的,
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String inputString = value.toString();
String outputString = null;
Text resultValue = null;
String finalResult = null;
String exceptionMessage = null;
try {
outputString = processInput(dataSet, inputString);
} catch (MalformedURLException me) {
System.out.println("MalformedURLException Occurred in Mapper:"
+ me.getMessage());
exceptionMessage = me.getMessage();
} catch (SolrServerException se) {
System.out.println("SolrServerException Occurred in Mapper:"
+ se.getMessage());
exceptionMessage = se.getMessage();
}
if (outputString == null || outputString.isEmpty()
&& exceptionMessage != null) {
exceptionMessage = exceptionMessage.replaceAll("n", ", ");
finalResult = inputString + "t[Error] =" + exceptionMessage;
resultValue = new Text(finalResult);
multipleOutputs.write(SearchConstants.FAILURE_FILE,NullWritable.get(), resultValue);
} else {
finalResult = inputString + outputString;
resultValue = new Text(finalResult);
multipleOutputs.write(SearchConstants.SUCCESS_FILE,NullWritable.get(), resultValue);
}
}
你们中的任何人都可以给我一个使用多输出进行MRUnit测试的工作示例吗?
下面是一个示例,其中包含您的类的稍微简化版本
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import java.io.IOException;
public class SomeMapper extends Mapper<LongWritable, Text, NullWritable, Text> {
public static final String SUCCESS_FILE = "successFile";
private static MultipleOutputs<NullWritable, Text> multipleOutputs;
private static Text result = new Text();
@Override
public void setup(Context context) throws IOException, InterruptedException {
multipleOutputs = new MultipleOutputs<>(context);
super.setup(context);
}
@Override
public void map(LongWritable key, Text value, Mapper.Context context) throws IOException, InterruptedException {
String outputString = "some result"; // logic here
result.set(outputString);
multipleOutputs.write(SUCCESS_FILE, NullWritable.get(), result);
}
}
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.powermock.core.classloader.annotations.PrepareForTest;
import org.powermock.modules.junit4.PowerMockRunner;
@RunWith(PowerMockRunner.class)
@PrepareForTest({MultipleOutputs.class, SomeMapper.class})
public class SomeMapperTest {
@Test
public void someTest() throws Exception {
MapDriver<LongWritable, Text, NullWritable, Text> mapDriver = MapDriver.newMapDriver(new SomeMapper());
mapDriver.withInput(new LongWritable(0), new Text("some input"))
.withMultiOutput(SomeMapper.SUCCESS_FILE, NullWritable.get(), new Text("some result"))
.runTest();
}
}
和 build.gradle
apply plugin: "java"
sourceCompatibility = 1.7
targetCompatibility = 1.7
repositories {
mavenCentral()
}
dependencies {
compile "org.apache.hadoop:hadoop-client:2.4.0"
testCompile "junit:junit:4.12"
testCompile("org.apache.mrunit:mrunit:1.1.0:hadoop2") {
exclude(group: "org.mockito")
}
testCompile "org.powermock:powermock-module-junit4:1.6.2"
testCompile "org.powermock:powermock-api-mockito:1.6.2"
}
请注意 Mockito 排除项。 没有它,我得到了java.lang.NoSuchMethodError: org.mockito.mock.MockCreationSettings.getSerializableMode()Lorg/mockito/mock/SerializableMode;
的例外,因为Hadoop依赖项拉入了org.mockito:mockito-core:1.9.5
,这与Powermock想要使用的Mockito版本冲突。
您可以在MRUnit的org.apache.hadoop.mrunit.mapreduce.TestMultipleOutput单元测试中找到其他示例。