PostgreSQL。INNER JOIN 不排除任何内容,并提供整个表



这是"任务"表:

+----+------------------------+-----------+-----------+-----------+----------------+
| id | task                   | subtask_1 | subtask_2 | subtask_3 | active_subtask |
+----+------------------------+-----------+-----------+-----------+----------------+
| 1  | Earn 1 million dollars | Earn 100k | Earn 500k | Earn 1m   | 3              |
+----+------------------------+-----------+-----------+-----------+----------------+
| 2  | Earn 2 million dollars | Earn 1m   | Earn 1.5m | Earn 2m   | 1              |
+----+------------------------+-----------+-----------+-----------+----------------+
| 3  | Earn 3 million dollars | Earn 2m   | Earn 2.5m | Earn 3m   | 2              |
+----+------------------------+-----------+-----------+-----------+----------------+
| 4  | ...                    | ...       | ...       | ...       | ...            |
+----+------------------------+-----------+-----------+-----------+----------------+

我有两个查询要加入,以获得具有非唯一活动子任务的任务列表。第一个给了我一个每个任务的活动子任务(有一些"在哪里"的条件(:

+----+-----------+
| id | subtask   |
+----+-----------+
| 1  | Earn 1m   |
+----+-----------+
| 2  | Earn 1m   |
+----+-----------+
| 3  | Earn 2.5m |
+----+-----------+
| .  | ...       |
+----+-----------+

第二个给了我一个所有非唯一活动子任务的列表(即有不止一个任务具有此活动子任务(:

+---------+
| subtask |
+---------+
| Earn 1m |
+---------+
| ...     |
+---------+

在这里,我尝试使用INNER JOIN来获取所有具有非唯一活动子任务的任务。预期结果是:

+----+---------+
| id | subtask |
+----+---------+
| 1  | Earn 1m |
+----+---------+
| 2  | Earn 1m |
+----+---------+
| .  | ...     |
+----+---------+

但我总是得到第一个查询的完整结果。

简化的SQL查询(没有"where"子句(如下所示:

SELECT id, 
CASE WHEN active_subtask = 1 THEN subtask_1 WHEN active_subtask = 2 THEN subtask_2 WHEN active_subtask = 3 THEN subtask_3 END 
AS subtask 
FROM "tasks" 
INNER JOIN 
(SELECT CASE WHEN active_subtask = 1 THEN subtask_1 WHEN active_subtask = 2 THEN subtask_2 WHEN active_subtask = 3 THEN subtask_3 END 
AS subtask 
FROM "tasks" 
GROUP BY subtask 
HAVING (COUNT(*) > 1)) AS agg 
ON subtask = agg.subtask 

我做错了什么?

我把你的问题理解为"哪些不同的任务具有相同的活动子任务">如果我说得对,这里有一种方法。。

with cte (id, subtask) as
(select id, split_part(concat_ws('|',subtask_1,subtask_2,subtask_3),'|', active_subtask)
from t)

select a.*
from cte a
join cte b on a.subtask = b.subtask and a.id <> b.id;

演示

这个方法有点麻烦,所以一定要注意nulls,并且|分隔符还不是数据的一部分。


另一种选择是

with cte (id, subtask) as
(select id, 
case when active_subtask=1 then subtask_1
when active_subtask=2 then subtask_2
when active_subtask=3 then subtask_3 end    
from tasks)
select * 
from cte
where subtask in (select subtask 
from cte 
group by subtask 
having count(distinct id)>1);

演示

我建议一种不同的方法:

/* non-unique subtasks */
select
id, task, subtask
from (
select
id, task, subtask, count(*) over(partition by subtask) c
from (
select id , task, subtask_1 as subtask from tasks where subtask_1 is not null union all
select id , task, subtask_2 as subtask from tasks where subtask_2 is not null union all
select id , task, subtask_3 as subtask from tasks where subtask_3 is not null
) fddl
) derived
where c > 1
;
/* unique subtasks */
select
id, task, subtask
from (
select
id, task, subtask, count(*) over(partition by subtask) c
from (
select id , task, subtask_1 as subtask from tasks where subtask_1 is not null union all
select id , task, subtask_2 as subtask from tasks where subtask_2 is not null union all
select id , task, subtask_3 as subtask from tasks where subtask_3 is not null
) fddl
) derived
where c = 1
;

这里的主要问题是,您有一个非规范化的数据结构,这限制了您简单查询的能力。在我看来,最好将子任务保存在一个单独的表中,以支持与任务表的多对一关系。实际上,这就是上面的联合子查询";有点";模拟。

最新更新