Pandas检查两个值是否同时出现但顺序不同并存储它们

我有一个像

这样的数据框架

animal, small_animal, count
cat, dog, 1
cat, moo, 2
dog, cat, 3
moo, moo, 5
squirrel, moo, 1
moo, cat, 3

我想已经储存在一起了cat, dog和dog, cat

所以我需要检查是否有东西同时出现，但顺序不同在第一列，保留第三列。我想到了多个不同的数据帧或一个字典。到目前为止，我做了一个groupby，但我仍然不能围绕其他事情工作。

您可以尝试比较以不同顺序连接的两列，并过滤掉两列中相同的动物。

m = (df['animal']+df['small_animal']).isin(df['small_animal']+df['animal'])
out = df[m & df['animal'].ne(df['small_animal'])]

print(out)
animal small_animal  count
0    cat          dog      1
1    cat          moo      2
2    dog          cat      3
5    moo          cat      3

可以用标签

创建一个新列

df["label_col"] = df[["animal", "small_animal"]].apply(
lambda x: "-".join(sorted(x)), axis=1
)
"""
Output
animal small_animal count    label_col
0       cat          dog     1       cat-dog
1       cat          moo     2       cat-moo
2       dog          cat     3       cat-dog
3       moo          moo     5       moo-moo
4  squirrel          moo     1  moo-squirrel
5       moo          cat     3       cat-moo
"""

然后你可以分组或做任何你想做的排序键label_col

以反映形式出现的具有不同名称的记录

names = ['animal', 'small_animal']
# include all pairs of animal names which occur in reflected form
is_reflected = pd.Index(df[names]).isin(pd.Index(df[reversed(names)]))
# exclude records where names are duplicated, sort of ('moo', 'moo') pairs
is_different = df.animal != df.small_animal
# extract counts for records with reflected and different names
df[is_reflected & is_different]['count']

以反映形式出现的具有不同名称的记录

相关内容

最新更新

热门标签：