我有 4 个具有不同模式的数据集。 我需要和他们一起加入left-anti
. 与其一个接一个地加入他们,我想知道有没有办法一次加入他们所有人。
所以这是 spark2.4.3 嵌套连接。所以我只是随机拿了一些东西来给你一个实现嵌套连接的想法。
第一个数据帧
scala> val someDF = Seq(
("user1", "math","algebra-1","90"),
("user1", "physics","gravity","70"),
("user3", "biology","health","50"),
("user2", "biology","health","100"),
("user1", "math","algebra-1","40"),
("user2", "physics","gravity-2","20")
).toDF("user_id", "course_id","lesson_name","score")
scala> someDF.show
+-------+---------+-----------+-----+
|user_id|course_id|lesson_name|score|
+-------+---------+-----------+-----+
| user1| math| algebra-1| 90|
| user1| physics| gravity| 70|
| user3| biology| health| 50|
| user2| biology| health| 100|
| user1| math| algebra-1| 40|
| user2| physics| gravity-2| 20|
+-------+---------+-----------+-----+
第二个数据帧
scala> var someDF2 = Seq(("math",121),("physics",122),("biology",123)).toDF("sid","rno")
scala> someDF2.show
+-------+---+
| sid|rno|
+-------+---+
| math|121|
|physics|122|
|biology|123|
+-------+---+
第三个数据帧
scala> var someDF3 = Seq((121,"G-1"),(122,"G-2"),(123,"G-3")).toDF("rno","grade")
scala> someDF3.show
+---+-----+
|rno|grade|
+---+-----+
|121| G-1|
|122| G-2|
|123| G-3|
+---+-----+
scala> someDF.join(someDF2,col("course_id")===col("sid"),"inner").join(someDF3,Seq("rno"),"inner").show
+---+-------+---------+-----------+-----+-------+-----+
|rno|user_id|course_id|lesson_name|score| sid|grade|
+---+-------+---------+-----------+-----+-------+-----+
|121| user1| math| algebra-1| 90| math| G-1|
|122| user1| physics| gravity| 70|physics| G-2|
|123| user3| biology| health| 50|biology| G-3|
|123| user2| biology| health| 100|biology| G-3|
|121| user1| math| algebra-1| 40| math| G-1|
|122| user2| physics| gravity-2| 20|physics| G-2|
+---+-------+---------+-----------+-----+-------+-----+
任何数据都没有意义,但它将服务于您的目的。 如果您有任何问题,请告诉我。