如何对两列进行分组,而不考虑值的顺序?



我有一个数据框架:

val1   val2   val3
a       b      10
a       b      2
b       a      3
f       k      5
f       k      2

当我做df.groupby(["val1", "val2"])["val3"].mean().reset_index()时,我得到:

val1   val2   val3
a       b      6
b       a      3
f       k      3.5

但是我不想考虑val1和val2的顺序。所以期望的结果是:

val1   val2   val3
a       b      5
f       k      3.5

怎么做?

nm = ["val1", "val2"]
grp = df[nm].apply(lambda x: tuple(sorted(list(x))), axis=1)
s = df.val3.groupby(grp).mean()
s.index = pd.MultiIndex.from_tuples(s.index, name=nm)
s.reset_index()
#   val1 val2  val3
# 0    a    b   5.0
# 1    f    k   3.5

另一种解决方案,使用frozenset:

x = (
df.groupby(df[["val1", "val2"]].apply(frozenset, axis=1))
.agg({"val1": "first", "val2": "first", "val3": "mean"})
.reset_index(drop=True)
)
print(x.to_markdown())

打印:

最新更新