将scala转换为Spark



我必须将下面的代码转换为spark,但我不明白Sql在这段代码中到底执行了什么?

val tempFactDF = unionTempDF.join(fact.select("x","y","d","f","s"),
Seq("x","y","d","f")).dropDuplicates

这里它在多个列上执行联接操作,它被定义为Seq("x","y","d","f")

相当于:

val joiningTable = fact.select("x","y","d","f","s")
unionTempDF.join(joiningTable, unionTempDF("x") === joiningTable("x") &&
unionTempDF("y") === joiningTable("y") &&
unionTempDF("d") === joiningTable("d") &&
unionTempDF("f") === joiningTable("f"))

最新更新