使用SQL标识异常的分组记录

我正在尝试使用sql来识别包含异常的分组订单/记录，而不是只将记录与该异常一起输出。显示所有行，包括有异常的行。我需要识别记录的组，其中该组中只有一个项目=0。

order number   item number   qty
12345           G123          5
12345           C123          4 
12345           I123          0 
12345           K123          6

我正在使用Spark SQL，并尝试使用Group-By子句，该子句的计数为不同的订单号>1

旧的经典SQL，看看它的性能很有趣。

使用我自己的数据，但要证明这一点。优化器可能会重写查询。

val df = Seq( 
( "A", "Bangalore", "a*.com", 1, "cpu" ),
( "A", "Bangalore", "a*.com", 9, "cpu" ),
( "A", "Bangalore", "a*.com", 7, "desktop" ),
( "C", "Bangalore", "a*.com", 0, "desktop" ),
( "C", "Bangalore", "a*.com", 0, "desktop" ),
( "D", "Bangalore", "a*.com", 0, "desktop" ),
( "D", "Bangalore", "a*.com", 0, "desktop" ),
( "D", "Bangalore", "a*.com", 0, "desktop" ),
( "B", "Bangalore", "a*.com", 5, "desktop" ),
( "B", "Bangalore", "a*.com", 0, "desktop" ),
( "B", "Bangalore", "a*.com", 19, "monitor" ),
).toDF("name" ,"address", "email", "floor", "resource")
df.createOrReplaceTempView("R")
val res = spark.sql(""" 
select *
from R
where R.name IN ( 
select X.name 
from (select name, count(*) 
from R
where floor = 0
group by name
having count(*) = 1 ) X 
)   

""")
res.show(false)

退货：

+----+---------+------+-----+--------+
|name|address  |email |floor|resource|
+----+---------+------+-----+--------+
|B   |Bangalore|a*.com|5    |desktop |
|B   |Bangalore|a*.com|0    |desktop |
|B   |Bangalore|a*.com|19   |monitor |
+----+---------+------+-----+--------+

相关内容

最新更新

热门标签：