创建的嵌套结构模式火花



df的现有列:

|-- col1: string (nullable = true)
|-- col2: string (nullable = true)
|-- col3: struct (nullable = true)
|    |-- col3_1: struct (nullable = true)
|    |    |-- colA: string (nullable = true)
|    |-- col3_2: struct (nullable = true)
|    |    |-- colB: string (nullable = true)
|-- col4: string (nullable = true)
|-- col5: string (nullable = true)

我只需要阅读以下列:

col1,col2, col3,

对于前两个列,我可以创建以下模式:

val schema = StructType(Array(StructField("col1", StringType), StructField("col2", LongType)))

嵌套结构的模式:

StructType(Array(StructField("col1", StringType), 
StructField("col3", StructType(StructField("col3_1",StructType(StructField("colA",StringType))),StructField("col3_2",StructType(StructField("colB",StringType)))))

错误:

error: overloaded method value apply with alternatives:

为嵌套结构创建模式的任何建议

您应该尝试这样的东西或为COL3声明case class并在模式中替换:

val schema = StructType(Seq(  
    StructField("col1",IntegerType,false),
    StructField("col2",StringType,false),
    StructField("col3",StructType(Seq(  
                       StructField("col3_1",StructType(Seq(  
                       StructField("colA",StringType,false)
                         ))),
                       StructField("col3_2",StructType(Seq(  
                       StructField("colB",StringType,false)
                         )))

相关内容

  • 没有找到相关文章

最新更新