如何在SQL中选择每个类别的前n条记录而不重复

我正在尝试按类别在数据库中选择一组食谱。标准是，我需要n每个类别的食谱数量，不能重复。因此，给定一个数据集recipes:

id | category
---|---------
1  | dairy
1  | eggs
1  | vegetarian
2  | dairy
2  | dessert
3  | thanksgiving
...

是否可以以这样的方式执行选择，即我得到的数据集看起来像这样，其中n=1？

id  | category
----|----------
1   | dairy
2   | dessert
3   | thanksgiving

我碰巧使用Presto来查询这个数据集，总共有大约30个类别。我最初认为也许我可以做一些嵌套的UNION语句，但a(对于我所拥有的类别数量来说，这将是乏味的；b(我认为它不会起作用，因为每个UNION都是自己的东西，对过去一无所知。我还考虑过使用

select id from (
select id, category, row_number() over (partition by category order by id)
from recipes)
where row_num < 2

这将允许我设置每个类别需要返回多少id，但不涉及删除重复项。

最终，我有一种感觉，这在SQL中是不可能的，我应该把它转移到Python或其他什么东西中，但如果可能的话，我很有兴趣看到它的实际应用！

你很接近。改为使用partition by id：

select id, category
from (select id, category,
row_number() over (partition by id order by id) as seqnum
from recipes
)
where seqnum = 1;

只有当您想确定您想要的行时，order by才会产生影响——例如，按字母顺序排列的第一个类别。

注意：如果您希望每个category有一个id，那么我可能建议聚合：

select category, min(id)
from t
group by category;

相关内容