Pandas Groupby and Apply

我正在执行一个Growby并在返回一些奇怪结果的数据帧上应用，我使用的是pandas 1.3.1

这是代码：

ddf = pd.DataFrame({
"id": [1,1,1,1,2]
})
def do_something(df):
return "x"
ddf["title"] = ddf.groupby("id").apply(do_something)
ddf

我期望CCD_ 1列中的每一行被分配值"0"；x〃；但当这种情况发生时，我得到的数据是：

id title
0        1   NaN
1        1     x
2        1     x
3        1   NaN
4        2   NaN

这是意料之中的事吗？

结果并不奇怪，这是正确的行为：apply为组返回一个值，这里是1和2，它成为聚合的索引：

>>> list(ddf.groupby("id"))
[(1,        # the group name (the future index of the grouped df)
id     # the subset dataframe of the group 2
0   1
1   1
2   1
3   1),
(2,        # the group name (the future index of the grouped df)
id     # the subset dataframe of the group 2
4   2)]

为什么我有结果？因为组的标签与您的数据帧索引相同：

>>> ddf.groupby("id").apply(do_something)
id
1    x
2    x
dtype: object

现在像这样更改id：

ddf['id'] += 10
#    id
# 0  11
# 1  11
# 2  11
# 3  11
# 4  12
ddf["title"] = ddf.groupby("id").apply(do_something)
#    id title
# 0  11   NaN
# 1  11   NaN
# 2  11   NaN
# 3  11   NaN
# 4  12   NaN

或者更改index:

ddf.index += 10
#    id
# 10  1
# 11  1
# 12  1
# 13  1
# 14  2
ddf["title"] = ddf.groupby("id").apply(do_something)
#     id title
# 10   1   NaN
# 11   1   NaN
# 12   1   NaN
# 13   1   NaN
# 14   2   NaN

是的。

首先，apply(do_something)部分像魔术一样工作，是之前的小组造成了问题。Groupby返回一个Groupby对象，它与普通的数据帧有点不同。如果你调试并检查groupby返回的内容，那么你可以看到你需要某种形式的摘要函数来使用它(mean max或sum(

df = ddf.groupby("id")
df.mean()

它导致了这样的结果：

Empty DataFrame
Columns: []
Index: [1, 2]

之后，CCD_ 6仅应用于索引1和2；然后集成到您的原始df中。这就是为什么只有带x的索引1和2。现在我建议去掉groupby，因为不清楚你为什么要在这里使用它。并深入了解按对象分组的

如果需要在聚合函数中使用GroupBy.transform的新列，则需要在groupby之后指定列进行处理，此处为id:

ddf["title"] = ddf.groupby("id")['id'].transform(do_something)

或者在函数中分配新列：

def do_something(x):
x['title'] = 'x'
return x
ddf = ddf.groupby("id").apply(do_something)

在另一个答案中解释为什么不在gis中工作。

相关内容

最新更新

热门标签：