如何从Python Pandas列表中的日期计算到最近日期和自最日期以来的天数?



我在Python中有如下的Pandas数据框架("col1"是datetime64数据格式):

col1
--------
23-11-2020
25-05-2021
...

此外,我有一个特殊日期列表,如下所示(值为"object")列表中的数据类型):

special_dates = ["25.11.2020", "23.11.2020", "01.06.2021", "20.05.2021", ...] 

和我需要创建2列在我的DataFrame:

  • col2 -距离special_dates列表中最近日期的天数
  • col3 -从special_dates列表中的最后日期算起的天数

请注意,有些月份有31或30天,而在拖欠年份中,2月的天数不同(28或29)

因此,我需要如下所示:

col1       | col2 | col3
-----------|------|......
23-11-2020 | 2    | 0
25-05-2021 | 7    | 5
...        | ...  | ...

如何在Python Pandas中实现?

向量合并:

df = pd.DataFrame({'col1':["23.11.2020", "25.05.2021", "26.05.2021", "26.05.2022", "26.05.2018"]})
s = pd.Series(pd.to_datetime(special_dates, dayfirst=True)).sort_values()
df['col1'] = pd.to_datetime(df['col1'], dayfirst=True)
df = df.sort_values(by='col1').reset_index()
df['col2'] = (pd.merge_asof(df, s.rename('other'), 
left_on='col1', right_on='other',
direction='forward', allow_exact_matches=True)['other']
.sub(df['col1']).dt.days
)
df['col3'] = (pd.merge_asof(df, s.rename('other'),
left_on='col1', right_on='other',
direction='backward', allow_exact_matches=True)['other']
.rsub(df['col1']).dt.days
)
df = df.set_index('index').sort_index()

输出:

col1   col2   col3
index                         
0     2020-11-23    0.0    0.0
1     2021-05-25    7.0    5.0
2     2021-05-26    6.0    6.0
3     2022-05-26    NaN  359.0
4     2018-05-26  912.0    NaN

旧答案(问题误解)

您可以使用numpy广播:

special_dates = ["25.11.2020", "23.11.2020", "01.06.2021", "20.05.2021"] 
df['col1'] = pd.to_datetime(df['col1'], dayfirst=True)
a = pd.to_datetime(special_dates, dayfirst=True).to_numpy()
out = (df
.join(pd.DataFrame((a-df['col1'].to_numpy()[:,None]),
index=df.index,
columns=range(1, len(special_dates)+1))
.add_prefix('date_')
.clip('0')
#.apply(lambda c: c.dt.days) # uncomment for days as int
)
)

输出:

col1 date_1 date_2   date_3   date_4
0 2020-11-23 2 days 0 days 190 days 178 days
1 2021-05-25 0 days 0 days   7 days   0 days

输出为整数(最后一行未注释):

col1  date_1  date_2  date_3  date_4
0 2020-11-23       2       0     190     178
1 2021-05-25       0       0       7       0

标题为日期的变体:

out = (df
.join(pd.DataFrame((a-df['col1'].to_numpy()[:,None]),
index=df.index,
columns=special_dates)
.clip('0')
.apply(lambda c: c.dt.days)
)
)

输出:

col1  25.11.2020  23.11.2020  01.06.2021  20.05.2021
0 2020-11-23           2           0         190         178
1 2021-05-25           0           0           7           0

可能不是最好/最有效的方法,但您可以使用本文中的days_between函数,然后计算天数之间的差异。这将给你:

import pandas as pd
import numpy as np
from datetime import datetime
def days_between(d1, d2):
d1 = datetime.strptime(d1, "%d-%m-%Y")
d2 = datetime.strptime(d2, "%d.%m.%Y")
return (d2 - d1).days
df = pd.DataFrame({'col1':["23-11-2020", "25-05-2021"]})
special_dates = ["25.11.2020", "23.11.2020", "01.06.2021", "20.05.2021"] 
for idx, date in enumerate(df['col1']):
col2=np.inf
col3=np.inf
for special_date in special_dates:
delta = days_between(date, special_date)
if delta >= 0 and delta < col2:
col2 = delta
if delta <= 0 and delta > -col3:
col3 = -delta
df.loc[df.index[idx], 'col2'] = col2
df.loc[df.index[idx], 'col3'] = col3
df.replace(np.inf, np.nan, inplace=True)
df[['col2','col3']].round(0)

相关内容

  • 没有找到相关文章

最新更新