在PostgreSQL 11.0中,删除基于一列的重复项,并将具有特定模式的行保留在另一列中



我在PostgreSQL 11.0 中有以下表格

entry_id    num            pathway_name
hsa00010    {00010}        Glycolysis / Gluconeogenesis - Homo sapiens (human)
mmu00010    {00010}        Glycolysis / Gluconeogenesis - Mus musculus (mouse)
rno00010    {00010}        Glycolysis / Gluconeogenesis - Rattus norvegicus (rat)
hsa00011    {00011}        Glycolysis / Gluconeogenesis - Homo sapiens (human)
mmu00011    {00011}        Glycolysis / Gluconeogenesis - Mus musculus (mouse)
rno00011    {00011}        Glycolysis / Gluconeogenesis - Rattus norvegicus (rat)
rno00012    {00012}        Glycolysis / Gluconeogenesis - Rattus norvegicus (rat)

我想删除基于列"num"的重复行,并保留以"hsa"开头的entry_id

预期输出为:

entry_id    num            pathway_name
hsa00010    {00010}        Glycolysis / Gluconeogenesis - Homo sapiens (human)
hsa00011    {00011}        Glycolysis / Gluconeogenesis - Homo sapiens (human)
rno00012    {00012}        Glycolysis / Gluconeogenesis - Rattus norvegicus (rat)

我尝试使用"distinct on(num("筛选行,但无法选择hsa entry_id。有没有办法选择带有hsa id的行并删除重复项?

非常感谢您的帮助!!

DISTINCT ON和适当的ORDER BY应该可以做到这一点:

SELECT DISTINCT ON (entry_id) *
FROM atable
ORDER BY entry_id, NOT starts_with(entry_id, 'hsa');

这依赖于CCD_ 3<TRUE

最新更新