想象一下在DateTime中索引的ohlc数据。我将在每个月的第n天重新采样这个数据框。
例如:
.
.
.
2020-09-24 1.0990 1.1000 1.0982 1.0991
2020-09-25 1.1018 1.1025 1.0964 1.0995
2020-09-26 1.1011 1.1020 1.1009 1.1018
.
.
.
2020-10-24 1.1045 1.1068 1.0995 1.1017
2020-10-25 1.1031 1.1074 1.1021 1.1045
2020-10-26 1.1071 1.1076 1.1012 1.1031
.
.
.
2020-11-23 1.1005 1.1075 1.0989 1.1071
2020-11-26 1.1079 1.1086 1.0992 1.1005
2020-11-27 1.1076 1.1087 1.1068 1.1079
.
.
.
2020-12-24 1.1058 1.1110 1.1054 1.1071
2020-12-25 1.1010 1.1087 1.0926 1.1058
2020-12-26 1.1049 1.1056 1.0983 1.1010
.
.
.
2021-01-24 1.1049 1.1059 1.1029 1.1048
2021-01-25 1.1025 1.1068 1.1014 1.1049
2021-01-26 1.1025 1.1028 1.1022 1.1025
我需要的是:
2020-09-25 1.1018 1.1025 1.0964 1.0995
2020-10-25 1.1031 1.1074 1.1021 1.1045
2020-11-25 1.1005 1.1075 1.0989 1.1071
2020-12-25 1.1010 1.1087 1.0926 1.1058
2021-01-25 1.1025 1.1068 1.1014 1.1049
事实上,我需要在每个月的25日重新抽样,如果没有数据,必须填写最近的以前的数据。
df[df.groupby(df['Date']+df['Date'].apply(lambda x: pd.DateOffset(days=25-x.day) if x.day<=25 else pd.DateOffset(days=25-x.day,months=1)))['Date'].transform(max)==df['Date']]
最简单的解决方案如下:
#Sort the dataframe
df = df.sort_values('date')
#Use ffill to fill nearst previous value (by timestamp) for Null elements
df = df.ffill(axis=0)
#Simpelly query by day
df.loc[df['date'].dt.day == float('25')]
输出:
date a b c d
1 2020-09-25 1.1018 1.1025 1.0964 1.0995
4 2020-10-25 1.1031 1.1074 1.1021 1.1045
10 2020-12-25 1.1010 1.1087 1.0926 1.1058
13 2021-01-25 1.1025 1.1068 1.1014 1.1049