在Java中将ORC转换为JSON



我正试图在单元测试中将输出的ORC文件转换为Java中的JSON。我一直在阅读他们的单元测试,灵感来自:

PrintStream origOut = System.out;
String outputFilename = "orc-file-dump.json";
String tmpFileLocationJson = createTempFileJson();
FileOutputStream myOut = new FileOutputStream(tmpFileLocationJson);
// replace stdout and run command
System.setOut(new PrintStream(myOut, true, StandardCharsets.UTF_8.toString()));
FileDump.main(new String[]{"data", tmpFileLocationJson});
System.out.flush();
System.setOut(origOut);
System.out.println("done");

像这样的东西。问题是,我不太确定如何将此代码等同于javautils的利用率:

例如,java -jar orc-tools-1.5.5-uber.jar data output-1595448128191.orc输出以下JSON转储。

{"integerExample":1,"nestedExample":{"sub1":"value1","sub2":42},"dateExample":"2018-01-04"}

所以我想把ORC转换成JSON,这样我就可以在单元测试中进行交叉引用。

编辑:这可能是包私有的:(https://github.com/apache/orc/blob/b9e82b3d7b473201bdcf46011c3b2fda10ef897f/java/tools/src/java/org/apache/orc/tools/PrintData.java#L227

好的,我从Hive中提供了代码,并将输出流覆盖到filewriter,然后将输出重定向到文件中以读取回测试。

static void printJsonData(String fileName, PrintStream printStream,
Reader reader) throws IOException, JSONException, org.codehaus.jettison.json.JSONException {
//    OutputStreamWriter out = new OutputStreamWriter(printStream, "UTF-8");
BufferedWriter out = new BufferedWriter(new FileWriter(fileName.concat(".json")));
RecordReader rows = reader.rows();
try {
TypeDescription schema = reader.getSchema();
VectorizedRowBatch batch = schema.createRowBatch();
while (rows.nextBatch(batch)) {
for (int r = 0; r < batch.size; ++r) {
JSONWriter writer = new JSONWriter(out);
printRow(writer, batch, schema, r);
out.write("n");
out.flush();
if (printStream.checkError()) {
throw new IOException("Error encountered when writing to stdout.");
}
}
}
} finally {
rows.close();
}
}

最新更新