根据条件熊猫蟒蛇在数据帧中删除行

>我有一个这样的数据帧

import pandas as pd
data = {'Index Title'  : ["Company1", "Company1", "Company2", "Company3"],
'BusinessType'     : ['Type 1', 'Type 2', 'Type 1', 'Type 2'],
'ID1'     : ['123', '456', '789', '012'] 
}
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print(df)

数据帧

其中索引标题是公司名称。对于公司 1，我有两种类型 - 类型 1 和类型 2。

对于公司 2，我只有类型 1 对于公司 3，我只有类型 2。

我想删除那些只有一种类型的行 - 类型 1 或类型 2。

因此，在这种情况下，它应该删除公司 2 和公司 3。

你能帮我最好的方法是什么吗？

对于此类问题，我们通常考虑基于groupby和transform的过滤，因为它非常快。

df[df.groupby(level=0)['BusinessType'].transform('nunique') > 1]
BusinessType  ID1
Index Title                  
Company1          Type 1  123
Company1          Type 2  456

第一步是确定与多个类型关联的组/行：

df.groupby(level=0)['BusinessType'].transform('nunique')
Index Title
Company1    2
Company1    2
Company2    1
Company3    1
Name: BusinessType, dtype: int64

从这里开始，我们删除与其 # 唯一类型关联的所有公司 == 1。

这是一种方式： - 你按Index Title分组 - 过滤是否有至少一个Type 1和一个Type 2

df = (
df.groupby('Index Title')
.filter(lambda x: (x['BusinessType']=='Type 1').any() & 
(x['BusinessType']=='Type 2').any())
.reset_index()
)

如果您正在寻找两种或更多类型，请更新，无论它们是类型 1 还是类型 2

df = (
df.groupby('Index Title')
.filter(lambda x: x['BusinessType'].nunique() > 1)
.reset_index()
)

在这种情况下，@cs95的答案是您应该使用的更干净的答案。

相关内容

最新更新

热门标签：