有没有办法一次连接两个以上的数据集?



我有 4 个具有不同模式的数据集。 我需要和他们一起加入left-anti. 与其一个接一个地加入他们,我想知道有没有办法一次加入他们所有人。

所以这是 spark2.4.3 嵌套连接。所以我只是随机拿了一些东西来给你一个实现嵌套连接的想法。

第一个数据帧

scala>    val someDF = Seq(
("user1", "math","algebra-1","90"),
("user1", "physics","gravity","70"),
("user3", "biology","health","50"),
("user2", "biology","health","100"),
("user1", "math","algebra-1","40"),
("user2", "physics","gravity-2","20")
).toDF("user_id", "course_id","lesson_name","score")
scala> someDF.show
+-------+---------+-----------+-----+
|user_id|course_id|lesson_name|score|
+-------+---------+-----------+-----+
|  user1|     math|  algebra-1|   90|
|  user1|  physics|    gravity|   70|
|  user3|  biology|     health|   50|
|  user2|  biology|     health|  100|
|  user1|     math|  algebra-1|   40|
|  user2|  physics|  gravity-2|   20|
+-------+---------+-----------+-----+

第二个数据帧

scala> var someDF2 = Seq(("math",121),("physics",122),("biology",123)).toDF("sid","rno")
scala> someDF2.show
+-------+---+
|    sid|rno|
+-------+---+
|   math|121|
|physics|122|
|biology|123|
+-------+---+

第三个数据帧

scala> var someDF3 = Seq((121,"G-1"),(122,"G-2"),(123,"G-3")).toDF("rno","grade")
scala> someDF3.show
+---+-----+
|rno|grade|
+---+-----+
|121|  G-1|
|122|  G-2|
|123|  G-3|
+---+-----+
scala> someDF.join(someDF2,col("course_id")===col("sid"),"inner").join(someDF3,Seq("rno"),"inner").show
+---+-------+---------+-----------+-----+-------+-----+                         
|rno|user_id|course_id|lesson_name|score|    sid|grade|
+---+-------+---------+-----------+-----+-------+-----+
|121|  user1|     math|  algebra-1|   90|   math|  G-1|
|122|  user1|  physics|    gravity|   70|physics|  G-2|
|123|  user3|  biology|     health|   50|biology|  G-3|
|123|  user2|  biology|     health|  100|biology|  G-3|
|121|  user1|     math|  algebra-1|   40|   math|  G-1|
|122|  user2|  physics|  gravity-2|   20|physics|  G-2|
+---+-------+---------+-----------+-----+-------+-----+

任何数据都没有意义,但它将服务于您的目的。 如果您有任何问题,请告诉我。

最新更新