Hadoop数据和控制流

我正在编写Hadoop应用程序，但似乎误解了Hadoop的工作原理。我的输入文件是地图的瓦片，根据QuadTile原理命名。我需要对它们进行子采样，并将它们缝合在一起，直到我有一个覆盖更大区域但分辨率较低的特定高级瓷砖。比如缩小谷歌地图。

我所做的一件事是，我编写了一个映射器，它在每个(不可拆分的)瓦片上执行，如下所示：

public void map(Text keyT, ImageWritable value, Context context) throws IOException, InterruptedException {
String key = keyT.toString();
//check whether file needs to be processed
if(key.startsWith(context.getJobName(), 0)){
String newKey = key.substring(0, key.length()-1);
ImageWritable iw = subSample(value);
char region = key.charAt(key.length()-1);
iw.setRegion(region);
context.write(new Text(newKey), iw);
}else{
//tile not needed in calculation
}
}

我的减速器是这样的：

public void reduce(Text key, Iterable<ImageWritable> values, Context context) throws IOException, InterruptedException{
ImageWritable higherLevelTile = new ImageWritable();
int i = 0;
for(ImageWritable s : values){
int width = s.getWidth();
int height = s.getHeight();
char c = Character.toUpperCase(s.getRegion());
int basex=0, basey=0;
if(c=='A'){
basex = basey = 0;
}else if(c=='B'){
basex = width;
basey = 0;
}else if(c=='C'){
basex = 0;
basey = height;             
}else{
basex = width;
basey = height;
}
BufferedImage toDraw = s.getBufferedImage();
Graphics g = higherLevelTile.getBufferedImage().getGraphics();
g.drawImage(toDraw, basex, basey, null);
}               
context.write(key, higherLevelTile);
}

正如您可能从我的代码中得出的那样，我希望hadoop以以下方式执行：1) 映射一级的所有瓷砖2) 做第一次减压。在这里，我期望Iterable值有四个元素：较低级别的四个子采样瓦片。3) 映射当前上下文中的所有平铺4) 减少上下文中的所有平铺。同样，Iterable值将有4个元素。。。5) 。。。重复6) 当没有更多的映射时->写入输出

事实证明，这是不对的。我的reducer是在每个Map之后调用的，Iterable似乎从来没有超过一个元素。我试图通过假设Iterable将有两个元素来修改reducer代码来解决这个问题：一个是子采样值，一个是部分完成的高级瓦片。事实证明，这也不正确。

有人能告诉我，或者告诉我hadoop的实际流程是什么吗？我应该怎么做才能使我的用例工作？我希望我解释清楚了。

您的假设是正确的，即所有映射都在第一次reduce开始之前完成。这是因为每个reduce都保证按排序顺序获得其输入，并且最后一个完成的映射可能会产生所有reduce的第一个键。

每个映射都会产生其输出，一个名为partitioner的可插入接口会选择应该接收每个键的reduce。默认情况下使用key.hashCode() % num_reduces，因为这在正常情况下提供了良好的分布。这可能是您的问题，因为没有要求"A"、"AB"和"ABC"进行相同的减少。

最后，每个reduce对其每个键调用一次。迭代器遍历与同一个键关联的值。请注意，这些值通常是未排序的，但可以使用辅助排序进行控制

看看：http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/。

如果你想要一个二次排序的例子，我写了一个，并把它放在Hadoop的例子中。http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/SecondarySort.java

相关内容

最新更新

热门标签：