我在数据帧中有以下数据集
Time_stamp x y
'2012-01-01 00:00:00' 8.97 1310.03
'2012-01-01 00:10:00' 9.91 1684.52
'2012-01-01 00:40:00' 9.64 1532.05
'2012-01-01 00:50:00' 11.84 1997.87
'2012-01-01 00:60:00' 11.69 2135.76
'2012-01-01 01:00:00' 12.14 2149.54
'2012-01-01 01:10:00' 13.43 2056.35
'2012-01-01 01:20:00' 9.88 1633.45
'2012-01-01 01:30:00' 9.01 1315.85
'2012-01-01 01:50:00' 8.33 1141.84
如您所见,每10分钟记录一次数据。但是,缺少时间戳及其相应的值,例如'2012-01-01 00:20:00'
和'2012-01-01 00:30:00'
。我想找到这样缺失的时间戳,并用nan
替换它们对应的值。像这样的
timestamp x y
`'2012-01-01 00:20:00'` nan nan
`'2012-01-01 00:30:00'` nan nan
任何关于如何在没有太多代码的情况下有效地做到这一点的想法。
首先将值转换为日期时间,2012-01-01 00:60:00
中的60Min
无效,因此替换为NaT
,删除错误的值NaT
,然后创建DatetimeIndex
,并通过DataFrame.asfreq
:添加缺失的日期时间
df['Time_stamp'] = pd.to_datetime(df['Time_stamp'].str.strip("'"), errors='coerce')
df = df.dropna(subset=['Time_stamp']).set_index('Time_stamp').asfreq('10Min')
print (df)
x y
Time_stamp
2012-01-01 00:00:00 8.97 1310.03
2012-01-01 00:10:00 9.91 1684.52
2012-01-01 00:20:00 NaN NaN
2012-01-01 00:30:00 NaN NaN
2012-01-01 00:40:00 9.64 1532.05
2012-01-01 00:50:00 11.84 1997.87
2012-01-01 01:00:00 12.14 2149.54
2012-01-01 01:10:00 13.43 2056.35
2012-01-01 01:20:00 9.88 1633.45
2012-01-01 01:30:00 9.01 1315.85
2012-01-01 01:40:00 NaN NaN
2012-01-01 01:50:00 8.33 1141.84