基于" sc"代码,我需要加入可reftable-1或Reftable-2
加入srctable条件:如果SC为" D",则可以在key = key1上与Reftable-1连接,以获取值。否则,如果SC为" u",则可以与retftable-2连接到key = key2&fk = key3,要获得值。
这是输入Spark DataFrame。
SRCTable:
-------------
KEY |SC |FK
-------------
66 |D | a
67 |U | b
70 |D | y
71 |U | q
-------------
RefTable-1:
--------------
KEY1 |Value |
--------------
66 |xyz1 |
67 |abc1 |
68 |fgr1 |
69 |yte1 |
70 |erx1 |
71 |ter1 |
--------------
RefTable-2:
--------------------
KEY2 |KEY3 |Value |
--------------------
66 | a |xyz2 |
67 | c |abc2 |
67 | b |fgr2 |
69 | g |yte2 |
70 | y |erx2 |
71 | q |ter2 |
--------------------
预期输出:
--------------------
KEY |SC |FK |Value |
--------------------
66 |D | a |xyz1 |
67 |U | b |fgr2 |
70 |D | y |erx1 |
71 |U | q |ter2 |
---------------------
注意:输入表将具有数百万个记录,因此需要优化的解决方案
这是一个代码,您可以使用DataFrame
val SRCTable = Seq((66, "D", "a"), (67, "U", "b"), (70, "D", "y"), (71, "U", "q")).toDF("KEY", "SC", "FK")
val RefTable1 = Seq((66, "xyz1"),(67, "abc1"),(68, "fgr1"),(69, "yte1"),(70, "erx1"),(71, "ter1")).toDF("KEY1", "Value")
val RefTable2 = Seq((66, "a", "xyz2"), (67, "c", "abc2"), (67, "b", "fgr2"), (69, "g", "yte2"), (70, "y", "erx2"), (71, "q", "ter2")).toDF("KEY2", "KEY3", "Value")
val join1 = SRCTable.where(SRCTable.col("SC").equalTo("D")).join(RefTable1, SRCTable.col("KEY") === RefTable1.col("KEY1")).select("KEY", "SC", "FK", "Value")
val join2 = SRCTable.where(SRCTable.col("SC").equalTo("U")).join(RefTable2, SRCTable.col("KEY") === RefTable2.col("KEY2") && SRCTable.col("FK") === RefTable2.col("KEY3") ).select("KEY", "SC", "FK", "Value")
join1.unionAll(join2).show
如果您有任何表演问题,我建议您查看如何对数据进行很好的划分,并查看广播对象,如果您的一个数据帧很小