在 pandas 中的 Groupby.size() 之后恢复索引

我需要在groupby.size()后恢复索引或使其可用，但有点它不适用于.size().我已经阅读了堆栈溢出帖子熊猫 - 在 Groupby 之后恢复索引，但所有帮助回复都严格使用聚合函数max()其他人呢？

一些代码示例：

df
Out[39]:
product_id
order_id    
2103    7546
2103    8278
2103    6790
2104    7546
2104    8278
2104    6790

df.groupby('product_id', as_index=True).size()
Out[67]:
product_id
3587      1
3590      1
3680      2
6735      5
6744      1
6759      6
df.groupby('product_id', as_index=False).size()
Out[68]:
product_id
3587      1
3590      1
3680      2
6735      5
6744      1
6759      6

如您所见as_index将参数更改为True或False索引没有任何反应。但所有这些都与.max()aggr 函数一起使用。所以，无论如何问题是如何在groupby.size()后恢复索引。

预期产出：

product_id
index   
2103 3587      1
2104 3590      1
2188 3680      2
2188 6735      5
2188 6744      1
2188 6759      6

一旦你执行groupby，原始索引就会丢失。这是因为，在内部，pandas使用石斑鱼列作为索引。

您可以做的是将索引提升为列，通过预先计算的序列映射product_id计数，然后再次设置索引。

对于此任务，可以使用value_counts代替groupby.size。

df = pd.DataFrame({'product_id': [7546, 8278, 6790, 7546, 8278, 6790]},
index=[2103, 2103, 2103, 2104, 2104, 2104])
c = df.product_id.value_counts()
res = df.reset_index()
res['count'] = res['product_id'].map(c)
res = res.set_index('index')
print(res)
product_id  count
index                   
2103         7546      2
2103         8278      2
2103         6790      2
2104         7546      2
2104         8278      2
2104         6790      2

相关内容

最新更新

热门标签：