我试图读取json到数据框,然后到数据集。我面临着以下问题。谁能帮帮我。
PrintDF数据框架已成功创建,下面是模式。
scala> personDF.printSchema();
root
|-- personDF: struct (nullable = true)
| |-- city: string (nullable = true)
| |-- line1: string (nullable = true)
| |-- postalCode: string (nullable = true)
| |-- state: string (nullable = true)
| |-- type1: string (nullable = true)
我创建了一个case类来捕获上面的DF。
scala> case class address1(city:String,line1:String,postalCode:String,state:String,type1:String)
defined class address1
下面是personDF目前拥有的数据
scala> personzDF.show()
+--------------------+
| personDF|
+--------------------+
|[CENTERPORT,5 PRO...|
|[HUNTINGTON,94 JA...|
|[RIVERHEAD,9 PATT...|
|[NORTHPORT,50 LIS...|
|[NORTHPORT,24 LAU...|
|[NORTHPORT,340 SC...|
|[GREENLAWN,166 BR...|
|[MELVILLE,1 MERID...|
+--------------------+
最后,当我创建数据集,我得到下面的错误。
scala> val ds = personDF.as[address1]
<console>:32: error: overloaded method value as with alternatives:
(alias: Symbol)org.apache.spark.sql.DataFrame <and>
(alias: String)org.apache.spark.sql.DataFrame
does not take type parameters
val ds = personDF.as[address1]
我用谷歌搜索了一下,找不到原因。
谢谢,Sivaram
实际上personDF是structType personDF的数组,但它不是您期望从printSchema()中得到的结构:
|——personDF: struct (nullable = true)
您正在尝试将personzDF转换为address1类型吗?然后试着这样做:
val ds = personzDF.map(rec => rec.split(","))
.map(rec => address1(rec(0), rec(1), rec(2), rec(3), rec(4)))
.toDF()
希望能有所帮助。