使用interpole将行添加到pandas数据帧



我正在尝试对包含时间序列数据的pandas DataFrame进行插值。我有temp的每小时数据,我想在半小时点对temp的值进行插值。这样,我估计每天每个交易时段的temp,即每天24小时,因此每天48个交易时段。

我的MWE是

import numpy as np
import pandas as pd
from datetime import datetime, date, timedelta
import pyarrow as pa
import pyarrow.parquet as pq
# my dataset
df = pd.DataFrame()
d1 = '2020-10-21'
d2 = '2020-10-22'
df['date'] = pd.to_datetime([d1]*24+[d2]*24, format='%Y-%m-%d')
df['time'] = pd.date_range(d1, periods=len(df), freq='H').time
df['temp'] = pd.DataFrame((50+20*np.sin(np.linspace(0,0.91*np.pi,len(df))))).values
# combine time and date
df.loc[:,'datetime'] = pd.to_datetime(df.date.astype(str)+' '+df.time.astype(str))
df = df.drop(['date','time'], axis=1)
df = df.set_index('datetime')
# trading period
df['tp'] = pd.DataFrame(df.index.hour.values*2+1).values
# interpolate to find temp and datetime for trading periods 2,4,6,...
for n in df.tp.values:
df.loc[-1,'tp'] = n+1
df = df.sort_values('tp').reset_index(drop=True)
#df = df.interpolate(method='linear')
print(df.head(10))

我正在修改这篇文章中的答案,但我得到了错误TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.。我怀疑这是由于df.loc[-1,'tp'] = n+1行造成的,但不确定如何修复。

尝试:

df = df.resample('30T').mean().interpolate()
df['tp'] = ((df.index.hour * 60 + df.index.minute) / 30 + 1).astype(int)

尝试asfreq,然后尝试interpolate:

In [36]: df.asfreq('30T').interpolate()
Out[36]:
temp    tp
datetime
2020-10-21 00:00:00  50.000000   1.0
2020-10-21 00:30:00  50.607891   2.0
2020-10-21 01:00:00  51.215782   3.0
2020-10-21 01:30:00  51.821424   4.0
2020-10-21 02:00:00  52.427066   5.0
...                        ...   ...
2020-10-22 21:00:00  57.869280  43.0
2020-10-22 21:30:00  57.303145  44.0
2020-10-22 22:00:00  56.737010  45.0
2020-10-22 22:30:00  56.158416  46.0
2020-10-22 23:00:00  55.579822  47.0
[95 rows x 2 columns]

相关内容

最新更新