df的现有列:
|-- col1: string (nullable = true)
|-- col2: string (nullable = true)
|-- col3: struct (nullable = true)
| |-- col3_1: struct (nullable = true)
| | |-- colA: string (nullable = true)
| |-- col3_2: struct (nullable = true)
| | |-- colB: string (nullable = true)
|-- col4: string (nullable = true)
|-- col5: string (nullable = true)
我只需要阅读以下列:
col1,col2, col3,
对于前两个列,我可以创建以下模式:
val schema = StructType(Array(StructField("col1", StringType), StructField("col2", LongType)))
嵌套结构的模式:
StructType(Array(StructField("col1", StringType),
StructField("col3", StructType(StructField("col3_1",StructType(StructField("colA",StringType))),StructField("col3_2",StructType(StructField("colB",StringType)))))
错误:
error: overloaded method value apply with alternatives:
为嵌套结构创建模式的任何建议
您应该尝试这样的东西或为COL3声明case class
并在模式中替换:
val schema = StructType(Seq(
StructField("col1",IntegerType,false),
StructField("col2",StringType,false),
StructField("col3",StructType(Seq(
StructField("col3_1",StructType(Seq(
StructField("colA",StringType,false)
))),
StructField("col3_2",StructType(Seq(
StructField("colB",StringType,false)
)))