我尝试在Spark HQL中加载以下JSON文件,但无法成功加载。给出_corrupt_record错误。
任何人都可以阐明此错误吗?我可以读取该文件并将其与其他应用程序一起使用,如记事本++(JSTool插件),我相信它是正确的并且没有损坏。
{"markers": [
{
"point":new GLatLng(40.266044,-74.718479),
"homeTeam":"Lawrence Library",
"awayTeam":"LUGip",
"markerImage":"images/red.png",
"information": "Linux users group meets second Wednesday of each month.",
"fixture":"Wednesday 7pm",
"capacity":"",
"previousScore":""
},
{
"point":new GLatLng(40.211600,-74.695702),
"homeTeam":"Hamilton Library",
"awayTeam":"LUGip HW SIG",
"markerImage":"images/white.png",
"information": "Linux users can meet the first Tuesday of the month to work out harward and configuration issues.",
"fixture":"Tuesday 7pm",
"capacity":"",
"tv":""
},
{
"point":new GLatLng(40.294535,-74.682012),
"homeTeam":"Applebees",
"awayTeam":"After LUPip Mtg Spot",
"markerImage":"images/newcastle.png",
"information": "Some of us go there after the main LUGip meeting, drink brews, and talk.",
"fixture":"Wednesday whenever",
"capacity":"2 to 4 pints",
"tv":""
},
] }
你的 JSON 应该在每个对象的一行中
{ object1 }
{ object2 }
默认情况下仅支持此结构read.json
。如果要读取多行 JSON,可以通过sparkContext.wholeTextFiles
和手动解析
文档中有文本:
请注意,作为 json 文件提供的文件不是典型的 JSON 文件。每行必须包含一个单独的、自包含的有效 JSON 对象。因此,常规的多行 JSON 文件将 最常失败。