hadoop减少了上下文和另一个输入文件的边连接

我有以下简单的减速器：

int i = 0;
int numPurchases = 0;
IntWritable count = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    i = 0;
    for (IntWritable val : values) {
        i = i + Integer.parseInt(val.toString());
        numPurchases ++;
    }
    count.set(i/numPurchases);
    numPurchases =0;
    context.write (key, count);
}

上面简单地将以下内容返回到输出：

customerId | avgPurchasePrice

上面的reducer是从一个文件File1中获取数据的。两个问题：

1）我可以将购买次数numPurchases添加到输出文件中吗？非常感谢任何关于如何实现这一目标的建议

2）现在我有了另一个文件File2。File2基本上有以下内容：

customerId | customerName | customerPhone | customerAddress。

我可以做一个reducer端连接，以便输出文件具有以下格式吗：

customerId | name | phone | avgPurchasePrice | totalPurchases？

如果有的话，我可以看看其他例子吗？

我建议这样做，

创建两个自定义类型客户密钥和采购摘要

1） 客户密钥：具有客户ID、姓名和电话号码。这应该实现WritableComparable

实现public int compareTo，使其使用customerID进行比较
重写toString方法

2） 采购汇总：具有平均采购价格和总采购量。您可以实现Writable

重写toString方法

我假设number totalPurchases是每个客户的条目数量之和。

在mapper中读取文本并创建CustomerKey的实例。该值应该与您现在所做的值相同
在reducer中创建PurchaseSummary实例并相应地填充其值

相关内容

最新更新

热门标签：