Regex模式删除json数组中的方括号和逗号，以便应用MapReduce

好吧，我基本上有一个JSON数组，如下所示：

[
  {
    product:something, 
    version:something
  },
  {
    product: something,
    version: something
  }
]

我需要删除每个json之间的逗号，就像花括号后面的逗号一样，我还需要删除方括号。这是必要的，因为我正在逐元素反序列化，所以如果我在它之间有一个逗号或括号，就会给我一个错误，或者在我看来是这样

无论如何，我一直在尝试创建一个正则表达式模式来替换这些元素。例如，假设读取的第一个json是这样的字符串：

[ 
 {
  product:something,
  version:something
 },

所以我有我的模式，它是这样的：[[]/}(?=,)]，但它匹配json中的最后一个括号和所有逗号，这不是我需要的。

有人能帮我吗？至少给我介绍一下教程什么的？

EDIT:我不能使用任何反序列化程序或类似的东西，基本上我将json数组的每个元素都作为一行读取，由"}"分隔。

找到了一种实现这一点的方法，在这种方法中，我不需要涉及任何正则表达式或对JSON的更改。不管怎样，它来了：

先到这里https://gist.github.com/Lupus/9988093，获取您正在使用的API的WholeFileInput格式。最后，这里是我如何在我的一个映射器中实现它的示例：

package mapreduce;

import java.io.IOException;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Mapper;
import org.json.*;
public class CommonErrorsMapper extends Mapper<NullWritable, BytesWritable, TextTriplet, IntWritable> {

         public void map(NullWritable key, BytesWritable value, Context context) throws IOException, InterruptedException{
            String product;
            String version;
            String errorCode;
            String json = new String(value.getBytes());
            try {
                JSONArray jObject = new JSONArray(json);
                for(int i = 0; i < jObject.length(); i++){
                    product = jObject.getJSONObject(i).getString("product");
                    version = jObject.getJSONObject(i).getString("version");
                    errorCode = jObject.getJSONObject(i).getString("errorCode");
                    context.write(new TextTriplet(product, version, errorCode), new IntWritable(1));
                }
            }catch(Exception error){
                error.printStackTrace();
            }
        }
}

把它留在这里，因为我发现mapreduce非常难理解，在使用JSON这样的格式时更难理解。所以不管怎么说，这似乎就是它的全部内容，除非是其他人找到了一种不必阅读整个文件的方法。

相关内容

最新更新

热门标签：