这是一个示例数据集

test = pd.DataFrame({
'a' : [1, 2, 3]*2,
'b' : ['a', 'a', 'b', 'b', 'b', 'b',],
'c' : [123, 456, 456, 123, 456, 123]
})
print(test)
a  b    c
0  1  a  123
1  2  a  456
2  3  b  456
3  1  b  123
4  2  b  456
5  3  b  123

如果我groupby列'a'和'b'，然后尝试获取每个组中唯一值('c'(的列表，则使用transform

# using transform
print(test.groupby([
'a',
'b',
]).c.transform(pd.Series.unique))
0    123
1    456
2    456
3    123
4    456
5    123

如果我改用unique，我几乎得到预期的输出：

# almost expected output
print(test.groupby([
'a',
'b',
]).c.unique())
a  b
1  a         [123]
b         [123]
2  a         [456]
b         [456]
3  b    [456, 123]
Name: c, dtype: object

我希望得到的是一个使用transform看起来像这样的pd.Series：

预期输出

0         [123]
1         [456]
2    [456, 123]
3         [123]
4         [456]
5    [456, 123]
dtype: object

我知道我可以使用transform来获取'c'的nunique值，作为一个系列来执行此操作：

print(test.groupby([
'a',
'b',
]).c.transform(pd.Series.nunique))
0    1
1    1
2    2
3    1
4    1
5    2
Name: c, dtype: int64

问题

为什么我不能对unique和transform做类似的事情？

旁注

我知道我可以做groupby和unique，然后reset_index和merge原始数据，但我希望有一种更pythonic/pandas友好的方法。

我也尝试使用set和transform，但这返回了一个错误。

print(test.groupby([
'a',
'b',
]).c.transform(set))
TypeError: 'set' type is unordered

test.groupby(['a','b'])['c'].transform('unique')

为你工作？

输出：

0         [123]
1         [456]
2    [456, 123]
3         [123]
4         [456]
5    [456, 123]
Name: c, dtype: object

想要使用 groupby 和转换创建包含唯一值列表的列

预期输出

问题

旁注

相关内容

最新更新

热门标签：