如何在pyspark中使用嵌套数组强制转换列



我在pyspark中有这个模式:

root
|-- SortedLenders: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- LenderID: string (nullable = true)
|    |    |-- MaxProfit: string (nullable = true)
|-- FilteredOutDecisions: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- ApprovedAmount: integer (nullable = true)
|    |    |-- Reasons: array (nullable = true)
|    |    |    |-- element: integer (containsNull = true)

如何强制转换FilteredOutDecisions。列变成双列?谢谢你,提前!

试试这个:

df = (
df
.withColumn('newFilteredOutDecisions', f.expr('transform(FilteredOutDecisions, element -> struct(element.ApprovedAmount as ApprovedAmount, transform(element.Reasons, value -> cast (value as double)) as Reasons))'))
)

最新更新