所有月份结束,直到结束日期



我有一个包含两列的df:

index   start_date   end_date
0       2000-01-03   2000-01-20
1       2000-01-04   2000-01-31
2       2000-01-05   2000-02-02 
3       2000-01-05   2000-02-17
...
5100    2020-12-29   2021-01-11
5111    2020-12-30   2021-03-15  

我想在开始日期和结束日期之间添加所有月结束日期的列,这样,如果end_date在一个月的中间,我仍然会考虑这个月的结束。所以,我的df看起来是这样的:

index   start_date   end_date     first_monthend   second_monthend third_monthend fourth_monthend
0       2000-01-03   2000-01-20   2000-01-31         0               0              0
1       2000-01-04   2000-01-31   2000-01-31         0               0              0
2       2000-01-05   2000-02-02   2000-01-31         2000-02-28      0              0
3       2000-01-05   2000-02-17   2000-01-31         2000-02-28      0              0
... ... ... ... ... ...
5100    2020-12-29   2021-02-11   2020-12-31         2021-01-31      2021-02-28     0
5111    2020-12-30   2021-03-15   2020-12-31         2021-01-31      2021-02-28    2021-03-31 
I would be very grateful if you could help me 

如果需要解析开始和结束日期时间之间的月份,并添加每个月的最后一天,请使用带有period_range:的自定义lambda函数

df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
def f(x):
r = pd.period_range(x['start_date'], 
x['end_date'], freq='m').to_timestamp(how='end').normalize()
return pd.Series(r)

df = df.join(df.apply(f, axis=1).fillna(0).add_suffix('_monthend'))
print (df)
start_date   end_date 0_monthend           1_monthend  
0    2000-01-03 2000-01-20 2000-01-31                    0   
1    2000-01-04 2000-01-31 2000-01-31                    0   
2    2000-01-05 2000-02-02 2000-01-31  2000-02-29 00:00:00   
3    2000-01-05 2000-02-17 2000-01-31  2000-02-29 00:00:00   
5100 2020-12-29 2021-01-11 2020-12-31  2021-01-31 00:00:00   
5111 2020-12-30 2021-03-15 2020-12-31  2021-01-31 00:00:00   
2_monthend           3_monthend  
0                       0                    0  
1                       0                    0  
2                       0                    0  
3                       0                    0  
5100                    0                    0  
5111  2021-02-28 00:00:00  2021-03-31 00:00:00  

如果未用0:替换缺失值

df = df.join(df.apply(f, axis=1).add_suffix('_monthend'))
print (df)
start_date   end_date 0_monthend 1_monthend 2_monthend 3_monthend
0    2000-01-03 2000-01-20 2000-01-31        NaT        NaT        NaT
1    2000-01-04 2000-01-31 2000-01-31        NaT        NaT        NaT
2    2000-01-05 2000-02-02 2000-01-31 2000-02-29        NaT        NaT
3    2000-01-05 2000-02-17 2000-01-31 2000-02-29        NaT        NaT
5100 2020-12-29 2021-01-11 2020-12-31 2021-01-31        NaT        NaT
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 2021-02-28 2021-03-31

相关内容

  • 没有找到相关文章

最新更新