如果不使用MapReduce中的setup()和closeup()方法怎么办



假设我有一个映射器,如下所示,mapper类为每个映射器获得本地前10名

public class TopTenMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
private TreeMap<Long, String> tmap;
// Called once in the beginning before each map method
@Override
public void setup(Context context) throws IOException, InterruptedException {
tmap = new TreeMap<Long, String>();
}
// Called once for each key/value pair in the input split
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] tokens = value.toString().split("\t");
String streetName = tokens[0];
Long numCrimes = Long.parseLong(tokens[1]);
tmap.put(numCrimes, streetName);
if(tmap.size() > 10){
tmap.remove(tmap.firstKey());
}
}
// Called once at the end of the task
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
for(Map.Entry<Long, String> entry : tmap.entrySet()){
context.write(new Text(entry.getValue()), new LongWritable(entry.getKey()));
}
}
}

我得到setup()map()之前被调用一次,而cleanup()在离开映射任务之前被调用了一次。但是,我可以把setup((的代码放在map((的开头吗?

public class TopTenMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
private TreeMap<Long, String> tmap;
// Called once for each key/value pair in the input split
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
tmap = new TreeMap<Long, String>();
String[] tokens = value.toString().split("\t");
String streetName = tokens[0];
Long numCrimes = Long.parseLong(tokens[1]);
tmap.put(numCrimes, streetName);
if(tmap.size() > 10){
tmap.remove(tmap.firstKey());
}
for(Map.Entry<Long, String> entry : tmap.entrySet()){
context.write(new Text(entry.getValue()), new LongWritable(entry.getKey()));
}
}
}

我认为tmap仍然是在映射任务的每一次初始化时初始化的,是吗?我必须使用setup()cleanup()方法的原因和场景是什么?

每个键值输入对调用一次map()TopTenMapper类本身(因此setup()(在每个映射任务中只初始化一次。

在第二个例子中,tmap = new TreeMap<Long, String>();map()中,你永远不会真正得到前十名,该地图中只有一个值

最新更新