计算范围内每一行之间的平均值

我有一个700x20大小的数据帧。我的数据是图像上特定位置的像素强度坐标，我有14个人，每个人有50张图像。我正在尝试进行降维，对于这样的任务，其中一个步骤要求我计算每个类之间的平均值，其中我有两个类。在我的数据帧中，每50行中都有属于一个类的特征，因此我会有0到50个a类特征，51到100个B类特征，101-150个a类特征，151-200个B类功能，依此类推

我想做的是计算每第N行的平均值，从N到M，然后计算平均值。以下是数据帧的链接，以便更好地显示问题：数据帧pickle文件

我尝试的是对数据帧进行排序并单独计算，但没有成功，它计算了每行的平均值，并将它们分组为14个不同的类。

class_feature_means = pd.DataFrame(columns=target_names)
for c, rows in df.groupby('class'):
class_feature_means[c] = rows.mean()
class_feature_means

最小可重复性示例：

my_array = np.asarray([[31, 25, 17, 62],
[31, 26, 19, 59,],
[31, 23, 17, 67,],
[31, 23, 19, 67,],
[31, 28, 17, 65,],
[32, 26, 19, 62,],
[32, 26, 17, 66,],
[30, 24, 17, 68],
[29, 24, 17, 68],
[33, 24, 17, 68],
[32, 52, 16, 68],
[29, 24, 17, 68],
[33, 24, 17, 68],
[32, 52, 16, 68],
[29, 24, 17, 68],
[33, 24, 17, 68],
[32, 52, 16, 68],
[30, 25, 16, 97]])

my_array = my_array.reshape(18, 4)
my_array = my_array.reshape(18, 4)
indices = sorted(list(range(0,int(my_array.shape[0]/3)))*3)
class_dict = dict(zip(range(0,int((my_array.shape[0]/3))), string.ascii_uppercase))
target_names = ["Index_" + c for c in class_dict.values()]
pixel_index = [1, 2, 3, 4]

X = pd.DataFrame(my_array, columns= pixel_index)
y = pd.Categorical.from_codes(indices,target_names)
df = X.join(pd.Series(y,name='class'))

df

基本上，我想做的是将分组为一个唯一的类a、C、E，取它们的和并除以3，从而获得类a的平均值，或者称之为类0。然后，将其分组为一个唯一的类B、D、F，取其和并除以3，从而获得B类或1类的平均值。

为组创建带整数除法和模的辅助数组，并传递给聚合sum的groupby，最后一次除法：

N = 3
arr = np.arange(len(df)) // N % 2
print (arr)
[0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1]
df = df.groupby(arr).sum() / N
print (df)
1          2          3           4
0  92.666667  82.666667  51.333333  198.000000
1  94.333333  92.666667  51.333333  210.333333

相关内容

最新更新

热门标签：