我有一个数据框架:
val1 val2
"a" "b"
"b" "a"
"c" "m"
"c" "m"
"m" "c"
如果两列在同一行上包含两个相似的值,则必须删除该行。所以期望的结果是:
val1 val2
"a" "b"
"c" "m"
我怎么能那样做呢?
您可以使用布尔索引与agg
/duplicated
:
# is the pair val1/var2 duplicated ?
m = df.agg(lambda x: sorted(list(x)), axis=1).duplicated(keep="first")
out = df.loc[~m]
输出:
print(out)
val1 val2
0 "a" "b"
2 "c" "m"