Hadoop矩阵乘法

我正在运行MapReduce矩阵乘法程序，该程序位于http://www.norstad.org/matrix-multiply/index.html.我发现当输入矩阵中有0时，这个实现不能正常工作。但我不明白为什么，以及如何修改程序使其工作？MapReduce操作完成，但输出始终是一个所有元素都为0的矩阵。

我的输入矩阵A&B是：

Matrix A     Matrix B
0 0 0        6 7 4 
0 1 6        9 1 3 
7 8 9        7 6 2

输出矩阵：

0 0 0
0 0 0
0 0 0

以下内容取自作业的日志文件：

矩阵B:的地图输出

##### Map setup: matrixA = false for hdfs://localhost/user/hadoop-user/B
strategy = 4
R1 = 4
I = 3
K = 3
J = 3
IB = 3
KB = 3
JB = 3
##### Map input: (0,0) 6
##### Map output: (0,0,0,1) (0,0,6) 
##### Map input: (0,1) 7
##### Map output: (0,0,0,1) (0,1,7) 
##### Map input: (0,2) 4
##### Map output: (0,0,0,1) (0,2,4) 
##### Map input: (1,0) 9
##### Map output: (0,0,0,1) (1,0,9) 
##### Map input: (1,1) 1
##### Map output: (0,0,0,1) (1,1,1) 
##### Map input: (1,2) 3
##### Map output: (0,0,0,1) (1,2,3) 
##### Map input: (2,0) 7
##### Map output: (0,0,0,1) (2,0,7) 
##### Map input: (2,1) 6
##### Map output: (0,0,0,1) (2,1,6) 
##### Map input: (2,2) 2
##### Map output: (0,0,0,1) (2,2,2)

矩阵A的映射输出：

##### Map setup: matrixA = true for hdfs://localhost/user/hadoop-user/A
strategy = 4
R1 = 4
I = 3
K = 3
J = 3
IB = 3
KB = 3
JB = 3
##### Map input: (1,1) 1
##### Map output: (0,0,0,0) (1,1,1) 
##### Map input: (1,2) 6
##### Map output: (0,0,0,0) (1,2,6) 
##### Map input: (2,0) 7
##### Map output: (0,0,0,0) (2,0,7) 
##### Map input: (2,1) 8
##### Map output: (0,0,0,0) (2,1,8) 
##### Map input: (2,2) 9
##### Map output: (0,0,0,0) (2,2,9)

请注意，矩阵A的Map不会从输入文件中读取零。注意：两个输入文件都作为序列文件存储在HDFS中，输出也是一个序列文件。有人能说明问题出在哪里吗？谢谢

Mapper类的代码如下所示：

/** The job 1 mapper class. */
private static class Job1Mapper 
    extends Mapper<IndexPair, IntWritable, Key, Value>
{
    private Path path;
    private boolean matrixA;
    private Key key = new Key();
    private Value value = new Value();
    public void setup (Context context) {
        init(context);
        FileSplit split = (FileSplit)context.getInputSplit();
        path = split.getPath();
        matrixA = path.toString().startsWith(inputPathA);
        if (DEBUG) {
            System.out.println("##### Map setup: matrixA = " + matrixA + " for " + path);
            System.out.println("   strategy = " + strategy);
            System.out.println("   R1 = " + R1);
            System.out.println("   I = " + I);
            System.out.println("   K = " + K);
            System.out.println("   J = " + J);
            System.out.println("   IB = " + IB);
            System.out.println("   KB = " + KB);
            System.out.println("   JB = " + JB);
        }
    }
    private void printMapInput (IndexPair indexPair, IntWritable el) {
        System.out.println("##### Map input: (" + indexPair.index1 + "," + 
            indexPair.index2 + ") " + el.get());
    }
    private void printMapOutput (Key key, Value value) {
        System.out.println("##### Map output: (" + key.index1 + "," + 
            key.index2 + "," + key.index3 + "," + key.m + ") (" + 
            value.index1 + "," + value.index2 + "," + value.v + ") ");
    }
    private void badIndex (int index, int dim, String msg) {
        System.err.println("Invalid " + msg + " in " + path + ": " + index + " " + dim);
        System.exit(1);
    }
    public void map (IndexPair indexPair, IntWritable el, Context context)
        throws IOException, InterruptedException 
    {
        if (DEBUG) printMapInput(indexPair, el);
        int i = 0;
        int k = 0;
        int j = 0;
        if (matrixA) {
            i = indexPair.index1;
            if (i < 0 || i >= I) badIndex(i, I, "A row index");
            k = indexPair.index2;
            if (k < 0 || k >= K) badIndex(k, K, "A column index");
        } else {
            k = indexPair.index1;
            if (k < 0 || k >= K) badIndex(k, K, "B row index");
            j = indexPair.index2;
            if (j < 0 || j >= J) badIndex(j, J, "B column index");
        }
        value.v = el.get();
                if (matrixA) {
                    key.index1 = i/IB;
                    key.index3 = k/KB;
                    key.m = 0;
                    value.index1 = i % IB;
                    value.index2 = k % KB;
                    for (int jb = 0; jb < NJB; jb++) {
                        key.index2 = jb;
                        context.write(key, value);
                        if (DEBUG) printMapOutput(key, value);
                    }
                } else {
                    key.index2 = j/JB;
                    key.index3 = k/KB;
                    key.m = 1;
                    value.index1 = k % KB;
                    value.index2 = j % JB;
                    for (int ib = 0; ib < NIB; ib++) {
                        key.index1 = ib;
                        context.write(key, value);
                        if (DEBUG) printMapOutput(key, value);
                    }
        }
    }
}

如果有任何语法错误，那只是因为我复制粘贴了代码并在这里进行了修改，所以我可能忽略了一些东西。

我需要Map函数的逻辑帮助，它不会从输入文件中读取0，有人能告诉我为什么吗？

在TestMatrixMultiply.java中，从您链接的网站（可能包含用于将矩阵编码为预期IndexPair支持的文件格式的代码）中，有一个函数writeMatrix：

public static void writeMatrix (int[][] matrix, int rowDim, int colDim, String pathStr)
    throws IOException
{
    Path path = new Path(pathStr);
    SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, 
        MatrixMultiply.IndexPair.class, IntWritable.class, 
        SequenceFile.CompressionType.NONE);
    MatrixMultiply.IndexPair indexPair = new MatrixMultiply.IndexPair();
    IntWritable el = new IntWritable();
    for (int i = 0; i < rowDim; i++) {
        for (int j = 0; j < colDim; j++) {
            int v = matrix[i][j];
            if (v != 0) { // !!! well, that would be why we aren't writing 0s
                indexPair.index1 = i;
                indexPair.index2 = j;
                el.set(v);
                writer.append(indexPair, el);
            }
        }
    }
    writer.close();
}

插入内部for循环第二行的注释。

您的映射程序没有读取0s，因为您的输入文件不包含0s。

该代码经过严格设计，假设所有丢失的值都是0，并执行额外的检查以避免发出0，从而试图最大限度地减少网络流量。

下面的所有内容都是错误的，因为我误解了算法
（尽管上面的部分仍然有用）

在链接页面中，您使用的是策略3。策略3依赖于参与者的行为和排序顺序。不幸的是，参与者错了！这与未打印出的0是分开的。这里的partitioner直接错了，你得到的矩阵充满了0，因为它乘以了以前初始化为0的数据，而这些数据从未被块的正确数据覆盖。这在1机器操作中是隐藏的，因为partitioner是一个null操作，但在大多数集群中都会中断。

partitioner将中间密钥（kb，jb，ib）映射到一个reducer，如下所示：
r = (jb*KB + kb) mod R

它需要确保同一块的所有关键点都分配给同一个减速器。不幸的是，它保证这种情况不会发生，除非KB % numReducers == 0:

map (key, value)
   if from matrix A with key=(i,k) and value=a(i,k)
      for 0 <= jb < NJB
         emit (k/KB, jb, i/IB), (i mod IB, k mod KB, a(k,j)) // compare this...
   if from matrix B with key=(k,j) and value=b(k,j)
       emit (k/KB, j/JB, -1), (k mod KB, j mod KB, b(k,j))  // ...to this

对于矩阵A，正在迭代关键字jb。对于矩阵B，正在计算关键字jb。由于分配给reducer是循环的，因此保证a矩阵密钥将分配给与B矩阵密钥相同的reducer。因此，该算法失败了，因为它假设密钥的分配和排序是不正确的。当且仅当所有键都分配给一个reducer时，键顺序是正确的，但partitioner是错误的！

必须修改partitioner以将kb % numReducers用于策略3。这不是一个很好的分区器，但它是唯一一个可以工作的分区器。因为不相同的键需要按特定顺序排序到同一个reducer。

您实际放入问题中的代码与错误实际存在的位置无关。

相关内容

最新更新

热门标签：