我有一个包含两列的df:
index start_date end_date
0 2000-01-03 2000-01-20
1 2000-01-04 2000-01-31
2 2000-01-05 2000-02-02
3 2000-01-05 2000-02-17
...
5100 2020-12-29 2021-01-11
5111 2020-12-30 2021-03-15
我想在开始日期和结束日期之间添加所有月结束日期的列,这样,如果end_date在一个月的中间,我仍然会考虑这个月的结束。所以,我的df看起来是这样的:
index start_date end_date first_monthend second_monthend third_monthend fourth_monthend
0 2000-01-03 2000-01-20 2000-01-31 0 0 0
1 2000-01-04 2000-01-31 2000-01-31 0 0 0
2 2000-01-05 2000-02-02 2000-01-31 2000-02-28 0 0
3 2000-01-05 2000-02-17 2000-01-31 2000-02-28 0 0
... ... ... ... ... ...
5100 2020-12-29 2021-02-11 2020-12-31 2021-01-31 2021-02-28 0
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 2021-02-28 2021-03-31
I would be very grateful if you could help me
如果需要解析开始和结束日期时间之间的月份,并添加每个月的最后一天,请使用带有period_range
:的自定义lambda函数
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
def f(x):
r = pd.period_range(x['start_date'],
x['end_date'], freq='m').to_timestamp(how='end').normalize()
return pd.Series(r)
df = df.join(df.apply(f, axis=1).fillna(0).add_suffix('_monthend'))
print (df)
start_date end_date 0_monthend 1_monthend
0 2000-01-03 2000-01-20 2000-01-31 0
1 2000-01-04 2000-01-31 2000-01-31 0
2 2000-01-05 2000-02-02 2000-01-31 2000-02-29 00:00:00
3 2000-01-05 2000-02-17 2000-01-31 2000-02-29 00:00:00
5100 2020-12-29 2021-01-11 2020-12-31 2021-01-31 00:00:00
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 00:00:00
2_monthend 3_monthend
0 0 0
1 0 0
2 0 0
3 0 0
5100 0 0
5111 2021-02-28 00:00:00 2021-03-31 00:00:00
如果未用0
:替换缺失值
df = df.join(df.apply(f, axis=1).add_suffix('_monthend'))
print (df)
start_date end_date 0_monthend 1_monthend 2_monthend 3_monthend
0 2000-01-03 2000-01-20 2000-01-31 NaT NaT NaT
1 2000-01-04 2000-01-31 2000-01-31 NaT NaT NaT
2 2000-01-05 2000-02-02 2000-01-31 2000-02-29 NaT NaT
3 2000-01-05 2000-02-17 2000-01-31 2000-02-29 NaT NaT
5100 2020-12-29 2021-01-11 2020-12-31 2021-01-31 NaT NaT
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 2021-02-28 2021-03-31