如何解析常规(不是换行分隔)json与Apache Beam和杰克逊?

我试图学习如何解析JSON数据CSV格式与Apache Beam和杰克逊。我从一个非常简单的JSON文件开始:

{
"firstName": "John", 
"lastName": "Smith", 
"isAlive": true, 
"age": 27
}

我有一个对应的POJO结构:

import java.io.Serializable;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
@JsonIgnoreProperties(ignoreUnknown = true)
public class Person implements Serializable {

private String firstName;
private String lastName;
private int age;
public Person() {}
public String getFirstName() {
return firstName;
}
... getters & setters ...

然而，当我试图解析这个json时，我得到一个格式化错误:

Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected close marker '}': expected ']' (for root starting at [Source: }; line: 1, column: 0])
at [Source: }; line: 1, column: 2]

我通过将json转换为以下格式来解决:

{"firstName": "John", "lastName": "Smith", "isAlive": true, "age": 27}

我最终的需要是处理普通的旧json。有办法做到这一点吗?如果有，怎么做?

Apache Beam代码是这样一个简单的管道:

public class DataToModel {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(DirectRunner.class);
Pipeline p = Pipeline.create(options);
// read data from json
PCollection<String>  json = p.apply(TextIO.read().from("src/main/resources/test.json"));
PCollection<Person> person = json
.apply(ParseJsons.of(Person.class))
.setCoder(SerializableCoder.of(Person.class));
// parse json
PCollection<String> names = person.apply(MapElements
.into(TypeDescriptors.strings())
.via(Person::getFirstName)
);
// write information to file.
names.apply(TextIO.write().to("src/main/resources/test_out"));
p.run().waitUntilFinish();
}

问题是您正在使用TextIO.read()从json文件中读取。TextIO将文本文件的每一行读取到一个单独的元素中，因此多行JSON对象被拆分为多个元素。这意味着您的解析函数尝试解析JSON字符串，如};。这也解释了为什么如果将对象完全格式化为一行，它就会成功。

你可以使用两种方法，这取决于你有什么可用的。

如果可能的话，您可以使用withDelimiter方法来使用默认换行符之外的自定义分隔符。然而，这是相当脆弱的，并要求您的文件进行非常具体的格式化。
您可以从TextIO切换到FileIO，并将每个文件读取为单个字符串发送给ParseJsons。这是稍微多一点的工作，但远没有那么脆弱，这是我推荐的。

您可以使用组织。json库;它很容易使用。

请记住(在强制转换或使用getJSONObject和getJSONArray等方法时)JSON表示法[…]代表一个数组，所以库将其解析为JSONArray{…}表示一个对象，所以库将其解析为JSONObject

你可以看到更多关于。

你可以看到一个简单的例子:

JSON文件:

{
"pageInfo": {
"pageName": "abc",
"pagePic": "http://example.com/content.jpg"
},
"posts": [
{
"post_id": "123456789012_123456789012",
"actor_id": "1234567890",
"picOfPersonWhoPosted": "http://example.com/photo.jpg",
"nameOfPersonWhoPosted": "Jane Doe",
"message": "Sounds cool. Can't wait to see it!",
"likesCount": "2",
"comments": [],
"timeOfPost": "1234567890"
}
]
}

代码示例:

import org.json.*;
String jsonString = ... ; //assign your JSON String here
JSONObject obj = new JSONObject(jsonString);
String pageName = obj.getJSONObject("pageInfo").getString("pageName");
JSONArray arr = obj.getJSONArray("posts"); // notice that `"posts": [...]`
for (int i = 0; i < arr.length(); i++)
{
String post_id = arr.getJSONObject(i).getString("post_id");
......
}

您可以在这里找到更多示例:Parse JSON in Java.

可下载的jar。

相关内容

最新更新

热门标签：