基于行值对Pandas DF中多个列切片的操作



我正在处理一些非常巨大的数据数组,这些数组是我在Pandas DataFrame中组织的。我的一个例子或多或少像这个

>>> pd.DataFrame({'vp':aux_vp,'vs':aux_vs,'den':aux_den,'layer':facies_vol})
vp           layer
0        5163.788741  0.0
1        5062.234019  0.0
2        4869.894684  0.0
3        9126.546268  1.0
4        5566.053159  1.0
...      ...
1254523  6177.467626  0.0
1254524  4756.891403  0.0
1254525  6244.816685  2.0

现在,我需要的是计算列"中的值的平均值;vp";,在由"0"的值定义的切片中;层";,从而预期输出将是

vp    layer  averages
0        5163  0.0    5031.3
1        5062  0.0    5031.3
2        4869  0.0    5031.3
3        9126  1.0    7346
4        5566  1.0    7346
...   ...    ...
1254523  6177  0.0    5466.5
1254524  4756  0.0    5466.5
1254525  6244  2.0    6244

每个切片中平均值的重复是奖励。我真正不能做的是在不解析所有行的情况下执行此操作。我已经尝试过使用numpy来实现这一点;层";数组发生变化,然后用for循环计算:

vp= np.array(...) #same as vp in pandas column
layer= np.array(...) #same as layer in pandas column
averages= np.zeros((len(vp))
indexes= np.add(np.where(layer[:-1] != layer[1:])[0],1) #here I compare the adjacent values of layer and store the index where they are different

for i in range(1,len(indexes)):
mean= np.mean(vp[indexes[i-1]:indexes[i]])
averages[indexes[i-1]:indexes[i]]=mean

但考虑到我的数据量,这需要很长时间。非常感谢!

IIUC您可以在使用shiftcumsum创建要分组的键列之后使用groupbytransform('mean')

df['averages'] = df.assign(key=(df['layer'] != df['layer'].shift()).cumsum()).groupby('key')['vp'].transform('mean')
vp  layer     averages
0        5163.788741    0.0  5031.972481
1        5062.234019    0.0  5031.972481
2        4869.894684    0.0  5031.972481
3        9126.546268    1.0  7346.299714
4        5566.053159    1.0  7346.299714
1254523  6177.467626    0.0  5467.179514
1254524  4756.891403    0.0  5467.179514
1254525  6244.816685    2.0  6244.816685

您可以分组以获得平均值,然后合并表。

mean = df.groupby(by='layer').agg({'value': 'mean'}).rename(columns={'value': 'mean'}).reset_index()
df.merge(mean, on='layer')
value  layer         mean
0    4998.663295    0.0  5000.727034
1    4999.460336    0.0  5000.727034
2    5002.241608    0.0  5000.727034
3    5000.057680    0.0  5000.727034
4    5000.036647    0.0  5000.727034
5    4999.525570    0.0  5000.727034
6    5002.602431    0.0  5000.727034
7    5000.774510    0.0  5000.727034
8    5001.872130    0.0  5000.727034
9    5002.036133    0.0  5000.727034
10   9999.825662    1.0  9999.648100
11   9999.707490    1.0  9999.648100
12   9999.601844    1.0  9999.648100
13   9999.137278    1.0  9999.648100
14   9999.544681    1.0  9999.648100
15   9999.971940    1.0  9999.648100
16  10001.009895    1.0  9999.648100
17   9999.212977    1.0  9999.648100
18  10001.271304    1.0  9999.648100
19   9997.197929    1.0  9999.648100

最新更新