如何在PySpark中交叉加入unnest ?



我有一个表:

01/05/2002 08/03/2002

使用您提供的示例,stack可以做到:

from pyspark.sql import functions as F
df = spark.createDataFrame(
[('Finance', 'John Doe', '01/01/2022', '01/05/2002'),
('Marketing', 'Mark Smith', '05/02/2022', '08/03/2002')],
['Department', 'Name', 'Start', 'End'])
df = df.select('Department', 'Name', F.expr("stack(2, 'Start', Start, 'End', End) as (Event, Date)"))
df.show()
# +----------+----------+-----+----------+
# |Department|      Name|Event|      Date|
# +----------+----------+-----+----------+
# |   Finance|  John Doe|Start|01/01/2022|
# |   Finance|  John Doe|  End|01/05/2002|
# | Marketing|Mark Smith|Start|05/02/2022|
# | Marketing|Mark Smith|  End|08/03/2002|
# +----------+----------+-----+----------+

相关内容

  • 没有找到相关文章

最新更新