我正在处理一些非常巨大的数据数组,这些数组是我在Pandas DataFrame中组织的。我的一个例子或多或少像这个
>>> pd.DataFrame({'vp':aux_vp,'vs':aux_vs,'den':aux_den,'layer':facies_vol})
vp layer
0 5163.788741 0.0
1 5062.234019 0.0
2 4869.894684 0.0
3 9126.546268 1.0
4 5566.053159 1.0
... ...
1254523 6177.467626 0.0
1254524 4756.891403 0.0
1254525 6244.816685 2.0
现在,我需要的是计算列"中的值的平均值;vp";,在由"0"的值定义的切片中;层";,从而预期输出将是
vp layer averages
0 5163 0.0 5031.3
1 5062 0.0 5031.3
2 4869 0.0 5031.3
3 9126 1.0 7346
4 5566 1.0 7346
... ... ...
1254523 6177 0.0 5466.5
1254524 4756 0.0 5466.5
1254525 6244 2.0 6244
每个切片中平均值的重复是奖励。我真正不能做的是在不解析所有行的情况下执行此操作。我已经尝试过使用numpy来实现这一点;层";数组发生变化,然后用for循环计算:
vp= np.array(...) #same as vp in pandas column
layer= np.array(...) #same as layer in pandas column
averages= np.zeros((len(vp))
indexes= np.add(np.where(layer[:-1] != layer[1:])[0],1) #here I compare the adjacent values of layer and store the index where they are different
for i in range(1,len(indexes)):
mean= np.mean(vp[indexes[i-1]:indexes[i]])
averages[indexes[i-1]:indexes[i]]=mean
但考虑到我的数据量,这需要很长时间。非常感谢!
IIUC您可以在使用shift
和cumsum
创建要分组的键列之后使用groupby
和transform('mean')
df['averages'] = df.assign(key=(df['layer'] != df['layer'].shift()).cumsum()).groupby('key')['vp'].transform('mean')
vp layer averages
0 5163.788741 0.0 5031.972481
1 5062.234019 0.0 5031.972481
2 4869.894684 0.0 5031.972481
3 9126.546268 1.0 7346.299714
4 5566.053159 1.0 7346.299714
1254523 6177.467626 0.0 5467.179514
1254524 4756.891403 0.0 5467.179514
1254525 6244.816685 2.0 6244.816685
您可以分组以获得平均值,然后合并表。
mean = df.groupby(by='layer').agg({'value': 'mean'}).rename(columns={'value': 'mean'}).reset_index()
df.merge(mean, on='layer')
value layer mean
0 4998.663295 0.0 5000.727034
1 4999.460336 0.0 5000.727034
2 5002.241608 0.0 5000.727034
3 5000.057680 0.0 5000.727034
4 5000.036647 0.0 5000.727034
5 4999.525570 0.0 5000.727034
6 5002.602431 0.0 5000.727034
7 5000.774510 0.0 5000.727034
8 5001.872130 0.0 5000.727034
9 5002.036133 0.0 5000.727034
10 9999.825662 1.0 9999.648100
11 9999.707490 1.0 9999.648100
12 9999.601844 1.0 9999.648100
13 9999.137278 1.0 9999.648100
14 9999.544681 1.0 9999.648100
15 9999.971940 1.0 9999.648100
16 10001.009895 1.0 9999.648100
17 9999.212977 1.0 9999.648100
18 10001.271304 1.0 9999.648100
19 9997.197929 1.0 9999.648100