现在我需要使用Spark SQL差异两个表,我找到了SQL Server的答案:
(SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2)
UNION ALL
(SELECT *
FROM table2
EXCEPT
SELECT *
FROM table1)
希望有人可以告诉我如何在SQL Server中使用Spark SQL?(不要关心特殊的col,只需使用 *)
您可以这样做:
scala> val df1=sc.parallelize(Seq((1,2),(3,4))).toDF("a","b")
df1: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> val df2=sc.parallelize(Seq((1,2),(5,6))).toDF("a","b")
df2: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> df1.create
createOrReplaceTempView createTempView
scala> df1.createTempView("table1")
scala> df2.createTempView("table2")
scala> spark.sql("select * from table1 EXCEPT select * from table2").show
+---+---+
| a| b|
+---+---+
| 3| 4|
+---+---+
scala> spark.sql("(select * from table2 EXCEPT select * from table1) UNION ALL (select * from table1 EXCEPT select * from table2)").show
+---+---+
| a| b|
+---+---+
| 5| 6|
| 3| 4|
+---+---+
注意:在您的情况下,您必须从JDBC调用中进行数据框架,然后注册表并执行操作。