我正在尝试按set
进行转换,但我得到了一个例外。转换适用于"sum"和许多其他聚合函数,但不适用于set
或list
。
>>> import pandas as pd
>>> df = pd.DataFrame({"a":[1,2,1,], "b":[1,1,2]})
>>> df
a b
0 1 1
1 2 1
2 1 2
>>> df.groupby("a").b.transform(set)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/avloss/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1032, in transform
s = klass(res, indexer)
File "/Users/avloss/conda/lib/python3.7/site-packages/pandas/core/series.py", line 282, in __init__
"{0!r} type is unordered" "".format(data.__class__.__name__)
TypeError: 'set' type is unordered
我期待的是:
a b
0 1 {1, 2}
1 2 {1}
2 1 {1, 2}
我能想到的最简单的方法是aggregate
为set
并将其映射回来
df['new_col'] = df['a'].map(df.groupby('a')['b'].agg(set))
print(df)
a b new_col
0 1 1 {1, 2}
1 2 1 {1}
2 1 2 {1, 2}
我们有unique
,请注意unique
甚至会保留您输入的原始顺序
df.groupby('a').b.transform('unique')
0 [1, 2]
1 [1]
2 [1, 2]
Name: b, dtype: object
如果上述不起作用,则更新 ,请在下面尝试
df.groupby('a').b.transform(lambda x : [set(x)]*len(x))
0 {1, 2}
1 {1}
2 {1, 2}
Name: b, dtype: object