SPARK - 无法读取多行 JSON(corrupt_record:字符串(可为空 = true))



我正在寻找有关标题问题的建议。我已经在数据砖 (https://docs.databricks.com/spark/latest/data-sources/read-json.html( 中读到,我可以使用以下表达式向数据帧读取多行 json:

println("2.2 Dataframe Multiline")
MULTILINE MODE!!
val df2=spark.read.option("multiline","true").option("charset","UTF-8").json("EXPORT1.json")
df2.printSchema()

这对我不起作用。如果我手动从 JSON 中删除所有隔断线,这就是生成的架构:

root
|-- results: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- address_components: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- long_name: string (nullable = true)
|    |    |    |    |-- short_name: string (nullable = true)
|    |    |    |    |-- types: array (nullable = true)
|    |    |    |    |    |-- element: string (containsNull = true)
|    |    |-- formatted_address: string (nullable = true)
|    |    |-- geometry: struct (nullable = true)
|    |    |    |-- bounds: struct (nullable = true)
|    |    |    |    |-- northeast: struct (nullable = true)
|    |    |    |    |    |-- lat: double (nullable = true)
|    |    |    |    |    |-- lng: double (nullable = true)
|    |    |    |    |-- southwest: struct (nullable = true)
|    |    |    |    |    |-- lat: double (nullable = true)
|    |    |    |    |    |-- lng: double (nullable = true)
|    |    |    |-- location: struct (nullable = true)
|    |    |    |    |-- lat: double (nullable = true)
|    |    |    |    |-- lng: double (nullable = true)
|    |    |    |-- location_type: string (nullable = true)
|    |    |    |-- viewport: struct (nullable = true)
|    |    |    |    |-- northeast: struct (nullable = true)
|    |    |    |    |    |-- lat: double (nullable = true)
|    |    |    |    |    |-- lng: double (nullable = true)
|    |    |    |    |-- southwest: struct (nullable = true)
|    |    |    |    |    |-- lat: double (nullable = true)
|    |    |    |    |    |-- lng: double (nullable = true)
|    |    |-- place_id: string (nullable = true)
|    |    |-- types: array (nullable = true)
|    |    |    |-- element: string (containsNull = true)
|-- status: string (nullable = true)+

这是我从谷歌下载的一个JSON示例:

{
"results" : [
{
"address_components" : [
{
"long_name" : "30152",
"short_name" : "30152",
"types" : [ "postal_code" ]
},
{
"long_name" : "Murcia",
"short_name" : "Murcia",
"types" : [ "locality", "political" ]
},
{
"long_name" : "Murcia",
"short_name" : "MU",
"types" : [ "administrative_area_level_2", "political" ]
},
{
"long_name" : "Region of Murcia",
"short_name" : "Region of Murcia",
"types" : [ "administrative_area_level_1", "political" ]
},
{
"long_name" : "Spain",
"short_name" : "ES",
"types" : [ "country", "political" ]
}
],
"formatted_address" : "30152 Murcia, Spain",
"geometry" : {
"bounds" : {
"northeast" : {
"lat" : 37.9659196,
"lng" : -1.1346723
},
"southwest" : {
"lat" : 37.9442828,
"lng" : -1.1687921
}
},
"location" : {
"lat" : 37.9569734,
"lng" : -1.1496969
},
"location_type" : "APPROXIMATE",
"viewport" : {
"northeast" : {
"lat" : 37.9659196,
"lng" : -1.1346723
},
"southwest" : {
"lat" : 37.9442828,
"lng" : -1.1687921
}
}
},
"place_id" : "ChIJZbDcb0Z_Yw0RUK0TPnKvAhw",
"types" : [ "postal_code" ]
}
],
"status" : "OK"
}

由于我想向 Google 提交许多请愿书,因此无法手动删除隔断线。

有人可以帮助我吗?提前谢谢。

为了解决这个问题,我所做的是存储删除所有换行符的 JSON:

以下类采用地址、组件、...并将地理位置请愿书写入 JSON

class Geolocation(var Address: String, var Component: String, var APIKey: String,  var JSONName:Int ){
val GeoLocURL_REQ="https://maps.googleapis.com/maps/api/geocode/json?address="+Address+"&components="+Component+"&key="+APIKey
val filename=JSONName.toString+"_LatLon.json"
val file = new File(filename)
val bw = new BufferedWriter(new FileWriter(file))
val svc = url(GeoLocURL_REQ)
val response : Future[String] = Http(svc OK as.String)
response onComplete {
case Success(content) => {
println("worked!" + content)
bw.write(content.replaceAll("\s", ""))  //con un \n va
//bw.write(content)
bw.close()
}
case Failure(t) => {
println("failed:! " + t.getMessage)
}
}
}

import dispatch._, Defaults._
var APIKey="TYPE YOUR OWN API HERE"
var PostalCode=30152
var Localidad = "Murcia"
val Component="postal_code="+PostalCode+"%7Ccountry=ES"  // "|" = %7C
var Address=Localidad+"+"+PostalCode
val geolocation= new Geolocation(Address,Component,APIKey, PostalCode )

希望这对某人有所帮助!

最新更新