hadoop mapreduce反之亦然

我有下面的示例数据，用于学习hadoop mapreduce。例如，它是follower和followee的数据。

Follower,followee   
    a,b
    a,c
    a,d
    c,b
    b,d
    d,a
    b,c
    b,e
    e,f

就像a跟在b后面，a跟在c后面，以此类推…

我试图处理数据并得到结果，这样，如果a在b后面，b也在a后面，那么a，b应该是输出txt文件中的结果。我是地图减少的新手，并试图找到一种方法，这样我就可以得到下面的结果。

 a,d
 c,b

您可以使用技巧来实现这一点。

诀窍是以这样一种方式将密钥传递给reducer，即（a，d）和（d，a）都有相同的密钥，并最终在同一个reducer中：

当（a，d）出现时：

'a' < 'd', hence emit:
key => a,d
value => a,d

当（d，a）出现时：

'd' > 'a', hence emit:
key => a,d
value => d,a

键的形成方式总是低字母在高字母之前因此，对于这两个记录，关键是"a，d"

因此映射器的输出为：

Record: a,b
Key = a,b  Value = a,b
Record: a,c
Key = a,c  Value = a,c
Record: a,d
Key = a,d  Value = a,d
Record: c,b
Key = b,c  Value = c,b
Record: b,d
Key = b,d  Value = b,d
Record: d,a
Key = a,d  Value = d,a
Record: b,c
Key = b,c  Value = b,c
Record: b,e
Key = b,e  Value = b,e
Record: e,f
Key = e,f  Value = e,f

现在，在Reducers中，记录将按以下顺序到达：

Record 1: 
    Key = a,b  Value = a,b
Record 2: 
    Key = a,c  Value = a,c
Record 3: 
    Key = a,d  Value = a,d
    Key = a,d  Value = d,a
Record 4: 
    Key = b,c  Value = c,b
    Key = b,c  Value = b,c
Record 5: 
    Key = b,d  Value = b,d
Record 6: 
    Key = b,e  Value = b,e
Record 7: 
    Key = e,f  Value = e,f

因此，在reducer中，您可以解析记录3和4：

Record 3: 
    Key = a,d  Value = a,d
    Key = a,d  Value = d,a
Record 4: 
    Key = b,c  Value = c,b
    Key = b,c  Value = b,c

因此，输出将是：

a,d
c,b

即使您使用的是名称而不是字母表，这种逻辑也会起作用。例如，您需要在映射器端使用以下逻辑（其中s1是第一个字符串，s2是第二个字符串）：

String key = "";
int compare = s1.compareToIgnoreCase(s2);
if(compare >= 0)
    key = s1 + "," + s2;
else
    key = s2 + "," + s1;

所以，如果你有：

String s1 = "Stack";
String s2 = "Overflow";

关键是：

Stack,Overflow

同样，如果你有：

s1 = "Overflow";
s2 = "Stack";

不过，关键是：

Stack,Overflow

相关内容

最新更新

热门标签：