如何将DataFrame中的行分组为用分隔符Scala Spark分隔的单行



我有这个Spark的DataFrame:

+-------------+
|father|child |
+-------------+
|Aaron |Adam  |
|Aaron |Berel |
|Aaron |Kasper|
|Levi  |Saul  |
|Levi  |Tiger |
+-------------+

如何按父项分组,并将所有数据放在带有分隔符的单个字段中?

我想要的输出是:

+------------------------+
|union_all_name_by_father|
+------------------------+
|Aaron;Adam;Berel;Kasper |
|Levi;Saul;Tiger         |
+------------------------+

您可以使用groupby,然后使用concat_ws:

val df2 = df.groupBy("father").agg(
concat_ws(";", collect_list(col("child"))).as("col2")
).select(concat_ws(";", col("father"), col("col2")).as("union_all_name_by_father"))
df2.show(false)
+------------------------+
|union_all_name_by_father|
+------------------------+
|Aaron;Adam;Berel;Kasper |
|Levi;Saul;Tiger         |
+------------------------+

最新更新