假设我有以下三个表来表达一种关系,其中帖子被赋予标签(多对多关系):
create table posts (id integer, content text, primary key (id));
create table tags (tag varchar(30), description text, primary key (tag));
create table post_tags (post_id integer, tag varchar(10),
primary key (post_id, tag),
foreign key (post_id) references posts (id),
foreign key (tag) references tags (tag));
现在假设我想找到所有带有标签{聪明,有趣}的帖子,而没有其他标签。
这是我失败的尝试。它会查找带有标签{聪明,有趣}的帖子,但也会查找带有标记{聪明、有趣、烦人}或{聪明的、有趣的、不道德的}的帖子。
select t1.post_id from post_tags as t1
inner join post_tags as t2 on t2.post_id=t1.post_id
where t1.tag='clever' and t2.tag='interesting';
我愿意接受任何改变结构以使操作更容易的建议。但是,我希望避免在posts
表中添加逗号分隔的标记列表之类的事情。
这是一个精确的关系除法问题。
在SQL Server中,一种性能良好的方法(假定post_id,tag
上有唯一约束)是
SELECT post_id
FROM post_tags
GROUP BY post_id
HAVING MIN(CASE
WHEN Keyword IN ( 'clever', 'interesting' ) THEN 1
ELSE 0
END) = 1
AND SUM(CASE
WHEN Keyword IN ( 'clever', 'interesting' ) THEN 1
ELSE 0
END) = 2
因此,我不排除在HAVING
中使用GROUP_CONCAT
的想法。
HAVING GROUP_CONCAT(DISTINCT Keyword ORDER BY Keyword) = 'clever,interesting'
应该能够使用NOT EXISTS
来实现这一点,例如
select t1.post_id
from post_tags as t1
inner join post_tags as t2 on t2.post_id = t1.post_id
where
t1.tag = 'clever'
and t2.tag = 'interesting'
and not exists (
select *
from post_tags t3
where
t3.tag not in ('clever', 'interesting')
and t3.post_id = t1.post_id
)