我看过很多关于删除重复项的帖子,但这些都不适用于我的案例。我只关心数据集是否包含与多个行业相关的ID,如果一个ID有多个行业,请从数据集中完全删除该ID和与其相关的行。这可以用SQL完成吗?蟒蛇
例如:
ID | Date | Industry |
S000123 | oct/1/22 | Media |
S000123 | oct/1/22 | Education |
S000456 | oct/4/22 | Auto |
S000789 | oct/4/22 | Beverage |
成为
ID | Date | Industry |
S000456 | oct/4/22 | Auto |
S000789 | oct/4/22 | Beverage |
这将只选择您要查找的行:
select *
from data
where ID in (
select ID
from data
group by ID
having count(distinct Industry) <= 1
)
内部查询只选择具有一个或多个不同Industry值(如果Industry为NULL(的ID。
你可以做:
delete from t
where id in (
select id from t group by id having count(distinct industry) > 1
)
见小提琴。