与不等式谓词的SQL子查询的解相关

我正在尝试去脱字查询，看起来像：

select A.id, A.other_id A.data, A.data2, 
(select count(*) from B where B.id = A.id and B.data < A.data),
(select count(*) from B where B.id = A.id and B.data < A.data and A.other_id = B.other_id),
(select count(*) from B where B.id = A.id and B.data < A.data and B.sth is True)
from A

我尝试了诸如select ... from A left join B on B.data < A.data where ...之类的东西，但结果并不完全相同，而且速度较慢。

有什么合理的方法可以使这种查询降低？

我想在火花中运行它，而这不支持与不等式谓词相关的子查询。

，或者也许有一种不同的方法可以实现与Spark一起使用的相同结果。

您可以用条件聚合替换它：

select A.id, A.other_id A.data, A.data2, 
       sum(case when b.data < a.data then 1 else 0 end),
       sum(case when b.other_id = a.other_idid and b.data < a.data then 1 else 0 end),
       sum(case when b.data < a.data and b.sth is true then 1 else 0 end)
from a left join
     b
     on a.id = b.id
group by A.id, A.other_id A.data, A.data2

相关内容

最新更新

热门标签：