我有一个数据集,里面有很多NaN和以下形式的数值:
PV_Power
2017-01-01 00:00:00 NaN
2017-01-01 01:00:00 NaN
2017-01-01 02:00:00 NaN
2017-01-01 03:00:00 NaN
2017-01-01 04:00:00 NaN
... ...
2017-12-31 20:00:00 NaN
2017-12-31 21:00:00 NaN
2017-12-31 22:00:00 NaN
2017-12-31 23:00:00 NaN
2018-01-01 00:00:00 NaN
我需要做的是,如果NaN值在其他NaN值之间,则用0替换它,如果它在数值之间,则使用插值结果替换它。你知道我该如何做到这一点吗?
如果需要在数值之间插值,请使用DataFrame.interpolate
和limit_area='inside'
,然后替换缺失的值:
print (df)
PV_Power
date
2017-01-01 00:00:00 NaN
2017-01-01 01:00:00 4.0
2017-01-01 02:00:00 NaN
2017-01-01 03:00:00 NaN
2017-01-01 04:00:00 5.0
2017-01-01 05:00:00 NaN
2017-01-01 06:00:00 NaN
df = df.interpolate(limit_area='inside').fillna(0)
print (df)
PV_Power
date
2017-01-01 00:00:00 0.000000
2017-01-01 01:00:00 4.000000
2017-01-01 02:00:00 4.333333
2017-01-01 03:00:00 4.666667
2017-01-01 04:00:00 5.000000
2017-01-01 05:00:00 0.000000
2017-01-01 06:00:00 0.000000
您可以重新索引数据帧
idx = df.index
df = df.dropna().reindex(idx, fill_value=0)
或者仅设置PV_Power为NaN:的值
df.loc[pd.isna(df.PV_Power), ["PV_Power"]] = 0
您可以使用fillna(0)
:-
df['PV_Power'].fillna(0, inplace=True)
或者你可以更换它:-
df['PV_Power'] = df['PV_Power'].replace(np.nan, 0)