我是Spark和Scala的新手,我正在尝试为我的一个学习项目学习Spark。我有一个看起来像这样的 JSON 文件:
[
{
"year": 2012,
"month": 8,
"title": "Batman"
},
{
"year": 2012,
"month": 8,
"title": "Hero"
},
{
"year": 2012,
"month": 7,
"title": "Robot"
}
]
我开始阅读这个json来激发数据帧文件,所以我尝试了以下方法:
spark.read
.option("multiline", true)
.option("mode", "PERMISSIVE")
.option("inferSchema", true)
.json(filePath)
它读取 JSON,但将数据转换为火花列。我的要求是将每个数据对象作为单独的列读取。
我想将其读取到火花数据帧,我希望输出如下所示:
+----------------------------------------+
|json |
+----------------------------------------+
|{"year":2012,"month":8,"title":"Batman"}|
|{"year":2012,"month":8,"title":"Hero"} |
|{"year":2012,"month":7,"title":"Robot"} |
|{"year":2011,"month":7,"title":"Git"} |
+----------------------------------------+
使用 toJSON
val df = spark.read
.option("multiline", true)
.option("mode", "PERMISSIVE")
.option("inferSchema", true)
.json(filePath).toJSON
现在
df.show(false)
+----------------------------------------+
|value |
+----------------------------------------+
|{"month":8,"title":"Batman","year":2012}|
|{"month":8,"title":"Hero","year":2012} |
|{"month":7,"title":"Robot","year":2012} |
+----------------------------------------+