我有一个数据帧,其模式如下
root
|-- ts: timestamp (nullable = true)
|-- address_list: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- active: integer (nullable = true)
| | |-- address: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- street: string (nullable = true)
| | | | |-- city: long (nullable = true)
| | | | |-- state: integer (nullable = true)
想在street和city之间的嵌套列address_list.address中添加一个新字段street_2。
以下是预期的模式
root
|-- ts: timestamp (nullable = true)
|-- address_list: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- active: integer (nullable = true)
| | |-- address: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- street: string (nullable = true)
| | | | |-- street_2: string (nullable = true)
| | | | |-- city: long (nullable = true)
| | | | |-- state: integer (nullable = true)
我确实尝试过使用transform,但它在末尾的address_list中添加了street_2字段
df
.withColumn("address_list",transform(col("address_list"), x => x.withField("street_2", lit(null).cast(string))))
root
|-- ts: timestamp (nullable = true)
|-- address_list: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- active: integer (nullable = true)
| | |-- address: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- street: string (nullable = true)
| | | | |-- city: long (nullable = true)
| | | | |-- state: integer (nullable = true)
| | |-- street_2: string (nullable = true)
我想把它放在地址里面,插入街道和城市之间的
你可以试试这个:
data.printSchema
val result = data.withColumn(
"person_details",
transform(col("person_details"), x => x.withField("person.details.age", lit(40))))
result.printSchema
root
|-- person_details: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- person: struct (nullable = true)
| | | |-- name: string (nullable = true)
| | | |-- details: struct (nullable = true)
| | | | |-- city: string (nullable = true)
| | | | |-- income: long (nullable = false)
root
|-- person_details: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- person: struct (nullable = true)
| | | |-- name: string (nullable = true)
| | | |-- details: struct (nullable = true)
| | | | |-- city: string (nullable = true)
| | | | |-- income: long (nullable = false)
| | | | |-- age: integer (nullable = false)
我从这篇帖子中得到了帮助:https://medium.com/@fqaiser94/操作嵌套数据调整-电容器中的光电放大器-spark-3-1-1-f88bc9003827