根据条件加入多个火花数据范围



基于" sc"代码,我需要加入可reftable-1或Reftable-2

加入srctable

条件:如果SC为" D",则可以在key = key1上与Reftable-1连接,以获取值。否则,如果SC为" u",则可以与retftable-2连接到key = key2&fk = key3,要获得值。

这是输入Spark DataFrame。

SRCTable:
    -------------
    KEY |SC  |FK 
    -------------
    66  |D   | a
    67  |U   | b
    70  |D   | y
    71  |U   | q
    -------------
 RefTable-1:
    --------------
    KEY1 |Value  | 
    --------------
    66   |xyz1   | 
    67   |abc1   | 
    68   |fgr1   |
    69   |yte1   |
    70   |erx1   |
    71   |ter1   |
    --------------
 RefTable-2:
    --------------------
    KEY2 |KEY3  |Value  | 
    --------------------
    66   | a    |xyz2   | 
    67   | c    |abc2   | 
    67   | b    |fgr2   |
    69   | g    |yte2   |
    70   | y    |erx2   |
    71   | q    |ter2   |
    --------------------

预期输出:

    --------------------
    KEY |SC  |FK |Value |
    -------------------- 
    66  |D   | a |xyz1  |
    67  |U   | b |fgr2  |
    70  |D   | y |erx1  |
    71  |U   | q |ter2  |
    ---------------------

注意:输入表将具有数百万个记录,因此需要优化的解决方案

这是一个代码,您可以使用DataFrame

上的加入函数进行测试
val SRCTable = Seq((66, "D", "a"), (67, "U", "b"), (70, "D", "y"), (71, "U", "q")).toDF("KEY", "SC", "FK")
val RefTable1 = Seq((66, "xyz1"),(67, "abc1"),(68, "fgr1"),(69, "yte1"),(70, "erx1"),(71, "ter1")).toDF("KEY1", "Value")
val RefTable2 = Seq((66, "a", "xyz2"), (67, "c", "abc2"), (67, "b", "fgr2"), (69, "g", "yte2"), (70, "y", "erx2"), (71, "q", "ter2")).toDF("KEY2", "KEY3", "Value")
val join1 = SRCTable.where(SRCTable.col("SC").equalTo("D")).join(RefTable1, SRCTable.col("KEY") === RefTable1.col("KEY1")).select("KEY", "SC", "FK", "Value")
val join2 = SRCTable.where(SRCTable.col("SC").equalTo("U")).join(RefTable2, SRCTable.col("KEY") === RefTable2.col("KEY2") && SRCTable.col("FK") === RefTable2.col("KEY3") ).select("KEY", "SC", "FK", "Value")
join1.unionAll(join2).show 

如果您有任何表演问题,我建议您查看如何对数据进行很好的划分,并查看广播对象,如果您的一个数据帧很小

相关内容

  • 没有找到相关文章

最新更新