查询以查找具有一组精确标记的帖子(多对多关系)



假设我有以下三个表来表达一种关系,其中帖子被赋予标签(多对多关系):

create table posts (id integer, content text, primary key (id));
create table tags (tag varchar(30), description text, primary key (tag));
create table post_tags (post_id integer, tag varchar(10),
    primary key (post_id, tag),
    foreign key (post_id) references posts (id),
    foreign key (tag) references tags (tag));

现在假设我想找到所有带有标签{聪明,有趣}的帖子,而没有其他标签。

这是我失败的尝试。它会查找带有标签{聪明,有趣}的帖子,但也会查找带有标记{聪明、有趣、烦人}或{聪明的、有趣的、不道德的}的帖子。

select t1.post_id from post_tags as t1
    inner join post_tags as t2 on t2.post_id=t1.post_id
    where t1.tag='clever' and t2.tag='interesting';

我愿意接受任何改变结构以使操作更容易的建议。但是,我希望避免在posts表中添加逗号分隔的标记列表之类的事情。

这是一个精确的关系除法问题。

在SQL Server中,一种性能良好的方法(假定post_id,tag上有唯一约束)是

SELECT post_id
FROM   post_tags
GROUP  BY post_id
HAVING MIN(CASE
             WHEN Keyword IN ( 'clever', 'interesting' ) THEN 1
             ELSE 0
           END) = 1
       AND SUM(CASE
                 WHEN Keyword IN ( 'clever', 'interesting' ) THEN 1
                 ELSE 0
               END) = 2  

因此,我不排除在HAVING中使用GROUP_CONCAT的想法。

HAVING GROUP_CONCAT(DISTINCT Keyword ORDER BY Keyword) = 'clever,interesting'

应该能够使用NOT EXISTS来实现这一点,例如

select t1.post_id
from post_tags as t1
    inner join post_tags as t2 on t2.post_id = t1.post_id
where 
    t1.tag = 'clever' 
and t2.tag = 'interesting' 
and not exists (
    select *
    from post_tags t3
    where 
        t3.tag not in ('clever', 'interesting')
    and t3.post_id = t1.post_id
)

最新更新