df1
no From To check
1 27-Jan-20 28-Mar-20 a
2 28-Mar-20 12-Apr-20 a
3 29-May-20 29-May-20 b
4 5-Apr-20 12-Apr-20 b
df2
col1 col2
a 9-Apr-20
b 30-Mar-20
df
no From To check total Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 27-Jan-20 28-Mar-20 a 45 5 20 20
2 28-Mar-20 12-Apr-20 a 9 2 7
3 29-May-20 29-May-20 b 1 1
4 5-Apr-20 12-Apr-20 b 5 5
我需要计算两件事
- 柱";总计";基于";从";以及";致";包括df2的任何假期
- 将";总计";各月份的列(1月至12月列(
对于第1部分:列";合计";在df1中使用进行计算
np.busday_count('2020-01-27','2020-03-28')
但这并不准确,也不能将假日(df2(包括在内我尝试使用直接创建数据帧
df['total']=np.busday_count(df1['From'].astype('datetime64[D]')
,df1['To'].astype('datetime64[D]'))
但这是一个错误。
您可以在自定义函数中使用bdate_range
# dict of num to month mapping
months = pd.tseries.frequencies.MONTH_ALIASES
df2['col2'] = pd.to_datetime(df2['col2'], dayfirst=True)
# holiday month
df['holiday'] = df['check'].map(df2.set_index(['col1'])['col2']).dt.month
def count_by_month(s):
start, end, holiday = s['From'], s['To'], s['holiday']
valid_dates = pd.bdate_range(start=start, end=end).month
count = dict(pd.Series(valid_dates).value_counts())
# subtract holidays
if holiday in count:
count[holiday] -= 1
return pd.concat([s, pd.Series({v: count.get(k, 0) for k, v in months.items()})], axis=0)
print(df)
no From To check total holiday JAN FEB MAR APR
0 1 27-Jan-20 28-Mar-20 a 45 4 5 20 20 0
1 2 28-Mar-20 12-Apr-20 a 9 4 0 0 2 7
2 3 29-May-20 29-May-20 b 1 3 0 0 0 0
3 4 5-Apr-20 12-Apr-20 b 5 3 0 0 0 5
MAY JUN JUL AUG SEP OCT NOV DEC
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0