组合不同采样率的熊猫数据帧



我有三个熊猫数据帧,其中包含测试期间记录的数据。一个框架用于温度,另一个框架用于真空,另一个框架用于电压。

数据是独立捕获的,因此每个帧的时间值不会对齐。只有偶尔,一个帧的时间戳在另一个帧中具有重复项。

我想做的是将它们合并到一个数据框中,然后插入缺失值,以便我有一个完整的数据框。

我是熊猫的新手,一直在闲逛,但我觉得我没有任何地方,或者我是否走在正确的道路上。

import pandas as pd
import numpy as np
rng1 = pd.date_range(
    '1/1/2012', 
    periods=10, 
    freq='H'
)
s1 = pd.Series(
    np.arange(10),
    index=rng1
)
df1 = pd.DataFrame(
    {'temp': s1}
)
s2 = pd.Series(
    np.arange(5, 10),
    index=['1/1/2012 01:20:00',
           '1/1/2012 01:40:00',
           '1/1/2012 02:00:00',
           '1/1/2012 05:30:00',
           '1/1/2012 06:00:00']
)
df2 = pd.DataFrame(
    {'voltage': s2},
)
print df1
print df2 
--output:--
                     temp
2012-01-01 00:00:00     0
2012-01-01 01:00:00     1
2012-01-01 02:00:00     2
2012-01-01 03:00:00     3
2012-01-01 04:00:00     4
2012-01-01 05:00:00     5
2012-01-01 06:00:00     6
2012-01-01 07:00:00     7
2012-01-01 08:00:00     8
2012-01-01 09:00:00     9
                   voltage
1/1/2012 01:20:00        5
1/1/2012 01:40:00        6
1/1/2012 02:00:00        7
1/1/2012 05:30:00        8
1/1/2012 06:00:00        9

combined = df1.join(df2, how='outer')
print combined
--output:--
                     temp  voltage
2012-01-01 00:00:00     0      NaN
2012-01-01 01:00:00     1      NaN
2012-01-01 01:20:00   NaN        5
2012-01-01 01:40:00   NaN        6
2012-01-01 02:00:00     2        7
2012-01-01 03:00:00     3      NaN
2012-01-01 04:00:00     4      NaN
2012-01-01 05:00:00     5      NaN
2012-01-01 05:30:00   NaN        8
2012-01-01 06:00:00     6        9
2012-01-01 07:00:00     7      NaN
2012-01-01 08:00:00     8      NaN
2012-01-01 09:00:00     9      NaN
combined = combined.apply(
    pd.Series.interpolate, 
    args=('time',) 
)
print combined
--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000       NaN
2012-01-01 01:00:00  1.000000       NaN
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000
print combined.fillna(method='backfill')
--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000  5.000000
2012-01-01 01:00:00  1.000000  5.000000
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000

最新更新