在 pandas 中使用 'isin(list1)' 来标识包含 list1 中所有项目的列中的值

对于如下所示的给定熊猫数据帧，

h1  h2  h3
mn  a   1
mn  b   1
rs  b   1
pq  a   1
we  c   1

如果我使用带有isin()的过滤，比如df[df["h2"].isin(["a","b"])]["h1"].unique()，将导致以下结果：

h1
mn
rs
pq

我需要找到与列表中所有元素匹配的条目，而不是与列表中任何元素匹配的行为，即所需的输出应该是：

h1
mn

具体如何实现这一点？isin()内列表中的元素数是任意的，可以超过 2。

您可以将issubset与每组set一起使用掩码：

s = df.groupby('h1')['h2'].apply(lambda x: set(["a","b"]).issubset(x))
print (s)
h1
mn     True
pq    False
rs    False
we    False
Name: h2, dtype: bool

然后过滤索引值：

vals = s.index[s]
print (vals)
Index(['mn'], dtype='object', name='h1')

将groupby.filter与np.isin一起使用：

new_df = df.groupby('h1').filter(lambda x: np.isin(['a','b'],x['h2']).all())
print(new_df)
h1 h2  h3
0  mn  a   1
1  mn  b   1

s = df.groupby('h1')['h2'].apply(lambda x: np.isin(['a','b'],x).all())
s.index[s]
#Index(['mn'], dtype='object', name='h1')

相关内容

最新更新

热门标签：