如何按顺序组合两个火花数据帧



我想将两个dataframes ab组合到列在列上排序的dataframe c

val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num")
val c = // how do I sort on char column?

这是我想要的结果:

 a.show()     b.show()      c.show()
+----+---+   +----+---+    +----+---+
|char|num|   |char|num|    |char|num|
+----+---+   +----+---+    +----+---+
|   a|  1|   |   b|  4|    |   a|  1|
|   c|  2|   |   d|  5|    |   b|  4|
|   e|  3|   +----+---+    |   c|  2|
+----+---+                 |   d|  5|
                           |   e|  3|
                           +----+---+

简单地,您可以在每个数据框架和union()上使用sort()

val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num").sort($"char")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num").sort($"char")
val c = a.union(b).sort($"char")

如果要为多个数据范围进行联合,我们可以尝试以这种方式尝试。

   val df1 = sc.parallelize(List(
  (50, 2, "arjun"),
  (34, 4, "bob")
)).toDF("age", "children","name")
val df2 = sc.parallelize(List(
  (51, 3, "jane"),
  (35, 5, "bob")
)).toDF("age", "children","name")
val df3 = sc.parallelize(List(
  (50, 2,"arjun"),
  (34, 4,"bob")
)).toDF("age", "children","name")

val result= Seq(df1, df2, df3)
val res_union=result.reduce(_ union _).sort($"age",$"name",$"children")
res_union.show()

相关内容

  • 没有找到相关文章

最新更新