我正在尝试使用列表对行进行分组，这是在panda中进行分组的方法之一。

目标：

我想从数据帧中对N行进行分组，所以我采用了groupby将list作为输入并按该顺序对行进行分组的方法。在解决这个问题之前，让我向您展示我用来对行进行分组的代码。

import math
df = pd.DataFrame(np.random.randint(0, 100, (100, 5)))
# Number or rows in group
n_elems = 20
# Total rows in the dataset
n_rows = df.shape[0]
# Groups to be created (Taking ceil to deal with even / odd number of rows)
n_groups = math.ceil(n_rows / n_elems)
groups = []
for idx in range(n_groups):
grp = [idx] * n_elems
groups.extend(grp)

# Making the same length - as groupby requires
groups = groups[:n_rows]
# Using list ↓ to group by
df.groupby(groups).agg(['mean', 'count'])

问题是：

现在，在这种情况下，当我将每组的行数从1到19时，算法运行良好。如果n_rows为1，则生成100组，如果n_rowes为2，则生成50组，如果n_rows为5，则生成20组，同样地，直到19。

但问题出现在第20位。我不知道为什么是20，它可以是基于其他行长度的其他数字，但这里给定n_rows为20，它应该返回5个组，每个组包含20行。但它返回了一个看起来很奇怪的数据帧，其中有100行，但没有0列！

我试着查了一下，但没有发现任何有用的东西。任何帮助都会让我更好地理解groupby。

提前谢谢。

尝试通过楼层划分index来创建组：

n_elems = 2
new_df = df.groupby(df.index // n_elems).agg(['mean', 'sum'])

0          1          2     
mean  sum  mean  sum  mean  sum
0  57.5  115  75.5  151  34.5   69
1  71.0  142  17.0   34  53.0  106
2  21.0   42  48.5   97  78.5  157

使用的样本DF：

import numpy as np
import pandas as pd
np.random.seed(5)
df = pd.DataFrame(np.random.randint(0, 100, (6, 3)))

df:

0   1   2
0  99  78  61
1  16  73   8
2  62  27  30
3  80   7  76
4  15  53  80
5  27  44  77

在df.groupby中使用list进行分组似乎不起作用

目标：

问题是：

相关内容

最新更新

热门标签：