组合在for循环中为2个输入值生成的2个数据帧的结果这是数据帧:
循环中第一个值的第一次DF:
+--------+-------------------------------+---+
|order_id|Diff |id |
+--------+-------------------------------+---+
|12 |order_status |1 |
|1 |order_customer_id order_status |1 |
|68885 |New row in DataFrame 2 |1 |
|68886 |New row in DataFrame 2 |1 |
|2 |order_customer_id |1 |
+--------+-------------------------------+---+
循环中第一个值的第二次DF:
+--------+-------------------------------+---+
|order_id|Diff |id |
+--------+-------------------------------+---+
|12 |order_status |2 |
|1 |order_customer_id order_status |2 |
|68885 |New row in DataFrame 2 |2 |
|68886 |New row in DataFrame 2 |2 |
|2 |order_customer_id |2 |
+--------+-------------------------------+---+
希望在最后将以上两个组合起来——也可以大于2,所以希望最终结果为组合DF。有人会有逻辑吗?
假设您有以下循环来生成一系列DataFrames:
import spark.implicits._
val dfs: Seq[DataFrame] = List(List((1,1)), List((2,2)), List((3,3))).map(l => l.toDF("a","b"))
您可以使用union
功能来组合它们:
val combinedDf = dfs.reduce(_ union _)
combinedDf.show()
+---+---+
| a| b|
+---+---+
| 1| 1|
| 2| 2|
| 3| 3|
+---+---+