一次迭代熊猫数据框架,然后一次选择n行数和列数



,所以我有一个数据集,如下:

# Example
     0  1     2   3  4   5
0   18  1   -19 -16 -5  19
1   18  0   -19 -17 -6  19
2   17  -1  -20 -17 -6  19
3   18  1   -19 -16 -5  20
4   18  0   -19 -16 -5  20

实际数据:

[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]

上述形状将是: (20, 6)

我要实现的是在当时4行上的每一列应用自定义功能。

示例:

  1. 第一次迭代 -> f()用于所有列 df.ix[0:3] ;
  2. 第二次迭代 -> f()用于所有列 df.ix[4:7]

等等...

在某种程度上,我需要的是大小4的滚动窗口4。

结果使用上述数据时,将是形状的数据框架:(5, 6)。只是为了参数,您可以假设自定义功能是每列的4行的平均值。

到目前为止我尝试了什么?

  1. 我正在考虑滚动,但滚动并不做我需要做的事情。它滚动一个窗户,大步为1。
  2. 实际实施了它,但是由于数据量:
  3. ,我确实需要对其进行优化。

这是代码:

curr = 0
res = []
while curr < df_to_look_at2.shape[0]:
    look_at = df_to_look_at2.ix[curr:curr+3]
    curr += 4
    res.append(look_at.mean().values.tolist())
pd.DataFrame(res)

和结果:

       0       1         2       3      4      5
0   17.75   0.25    -19.25  -16.50  -5.50   19.25
1   18.25   0.25    -19.00  -16.00  -5.25   19.50
2   17.75   0.25    -19.25  -16.75  -5.75   19.00
3   17.75   0.25    -19.00  -16.00  -4.75   19.75
4   17.75   0.25    -18.75  -14.75  -3.75   21.00

一个额外的想法,如果它不仅含义是均值,而是min((,max((,shay((和其他一些自定义函数...

如果您想在一个以上的窗口中考虑不止一次,则在这里滚动将是准确的。但是,您的窗户是唯一的,因此您真正问的是如何通过arange和地板部门进行分组。

window_size = 4
grouper = np.arange(df.shape[0]) // window_size
df.groupby(grouper).mean()

       0     1      2      3     4      5
0  17.75  0.25 -19.25 -16.50 -5.50  19.25
1  18.25  0.25 -19.00 -16.00 -5.25  19.50
2  17.75  0.25 -19.25 -16.75 -5.75  19.00
3  17.75  0.25 -19.00 -16.00 -4.75  19.75
4  17.75  0.25 -18.75 -14.75 -3.75  21.00

我认为以这种方式的多个计算确实属于Numpy Turf。您可以使用Reshape以所需的格式获取基础数组,然后根据需要在数组上计算。

inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
import pandas as pd
df = pd.DataFrame(inp)
temp = df.values.reshape(-1, 4, df.shape[-1])
out = pd.DataFrame(temp.mean(axis=1))

输出:

       0     1      2      3     4      5
0  17.75  0.25 -19.25 -16.50 -5.50  19.25
1  18.25  0.25 -19.00 -16.00 -5.25  19.50
2  17.75  0.25 -19.25 -16.75 -5.75  19.00
3  17.75  0.25 -19.00 -16.00 -4.75  19.75
4  17.75  0.25 -18.75 -14.75 -3.75  21.00

最新更新