在Hadoop中,如果您希望将每个键值对中的值保存到Array中,那么为什么添加的所有元素都相同呢?



我试图存储Map函数获取的键值对中的值并进一步使用它们。给定以下输入:

Hello hadoop goodbye hadoop
Hello world goodbye world
Hello thinker goodbye thinker

和下面的代码:

注释 -映射是简单的WordCount示例

public class Inception extends Configured implements Tool{
public Path workingPath;
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
     // initialising the arrays that contain the values and the keys
    public ArrayList<LongWritable> keyBuff = new ArrayList<LongWritable>();
    public ArrayList<Text> valueBuff = new ArrayList<Text>();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
            System.out.println(word + " / " + one);
        }
    }   
    public void innerMap(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            // adding the value to the bufferr
        valueBuff.add(value);
        System.out.println("ArrayList addValue -> " + value);
        for (Text v : valueBuff){
            System.out.println("ArrayList containedValue -> " + value);
        }
        keyBuff.add(key);
        }   
    public void run(Context context) throws IOException, InterruptedException {
        setup(context);
        // going over the key-value pairs and storing them into the arrays
        while(context.nextKeyValue()){
            innerMap(context.getCurrentKey(), context.getCurrentValue(), context);
        }

        Iterator itrv = valueBuff.iterator();
        Iterator itrk = keyBuff.iterator();
        while(itrv.hasNext()){
            LongWritable nextk = (LongWritable) itrk.next();
            Text nextv = (Text) itrv.next();
            System.out.println("Value iterator -> " + nextv);
            System.out.println("Key iterator -> " + nextk);
            // iterating over the values and running the map on them.
            map(nextk, nextv, context);
        }
        cleanup(context);
    }
 }
 public int run(String[] args) throws Exception { ... }
 public static void main (..) { ... }

好的,现在是日志输出:

<<p> stdout日志/strong>
ArrayList addValue -> Hello hadoop goodbye hadoop
ArrayList containedValue -> Hello hadoop goodbye hadoop
ArrayList addValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList addValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1

所以你可以注意到的是,每次我向数组列表valueBuff添加一个新值,列表中的所有值都会被覆盖。有人知道为什么会这样吗为什么数组中的值不能正确地添加?

TextInputFormat使用LineRecordReader。当Context#nextKeyValue被调用时,LineRecordReader#nextKeyValue也被调用。

在LineRecordReader中,在每次调用nextKeyValue方法时使用相同的键和值对象,只有它们的内容被改变。如果要保留键和值数据,则必须在用户代码中创建对象的副本。

这对优化是有意义的,如果为每条记录创建一个新的键和值对象,那么系统将很容易地进入OOM。

相关内容

  • 没有找到相关文章

最新更新