正在熊猫数据帧中查找丢失的时间戳



我在数据帧中有以下数据集

Time_stamp           x        y
'2012-01-01 00:00:00'   8.97    1310.03
'2012-01-01 00:10:00'   9.91    1684.52
'2012-01-01 00:40:00'   9.64    1532.05
'2012-01-01 00:50:00'   11.84   1997.87
'2012-01-01 00:60:00'   11.69   2135.76
'2012-01-01 01:00:00'   12.14   2149.54
'2012-01-01 01:10:00'   13.43   2056.35
'2012-01-01 01:20:00'   9.88    1633.45
'2012-01-01 01:30:00'   9.01    1315.85
'2012-01-01  01:50:00'   8.33    1141.84

如您所见,每10分钟记录一次数据。但是,缺少时间戳及其相应的值,例如'2012-01-01 00:20:00''2012-01-01 00:30:00'。我想找到这样缺失的时间戳,并用nan替换它们对应的值。像这样的

timestamp            x      y
`'2012-01-01 00:20:00'`   nan    nan
`'2012-01-01 00:30:00'`   nan    nan

任何关于如何在没有太多代码的情况下有效地做到这一点的想法。

首先将值转换为日期时间,2012-01-01 00:60:00中的60Min无效,因此替换为NaT,删除错误的值NaT,然后创建DatetimeIndex,并通过DataFrame.asfreq:添加缺失的日期时间

df['Time_stamp'] = pd.to_datetime(df['Time_stamp'].str.strip("'"), errors='coerce')
df = df.dropna(subset=['Time_stamp']).set_index('Time_stamp').asfreq('10Min')
print (df)
x        y
Time_stamp                         
2012-01-01 00:00:00   8.97  1310.03
2012-01-01 00:10:00   9.91  1684.52
2012-01-01 00:20:00    NaN      NaN
2012-01-01 00:30:00    NaN      NaN
2012-01-01 00:40:00   9.64  1532.05
2012-01-01 00:50:00  11.84  1997.87
2012-01-01 01:00:00  12.14  2149.54
2012-01-01 01:10:00  13.43  2056.35
2012-01-01 01:20:00   9.88  1633.45
2012-01-01 01:30:00   9.01  1315.85
2012-01-01 01:40:00    NaN      NaN
2012-01-01 01:50:00   8.33  1141.84

最新更新