Pandas数据框中两次约会之间的营业时间(包括节假日)



这里是Python新手用户-我正试图在熊猫数据框中计算两个日期之间的营业时间,该数据框给定周一至周五工作时间上午9点至下午5点,并排除澳大利亚公共假日。

在过去的几天里,我试图拼凑出很多解决方案,并将其应用于我的问题,但我遇到了很大的麻烦。

我将发布我当前的迭代,但也会寻求反馈,作为处理这一问题的最佳方式,并对未来如何解决这些问题有一些了解。

我的最后一次尝试是使用熊猫CDay,然后为澳大利亚日期创建一个自定义的节日日历,这一切似乎都很有效——然后从这一步开始将其应用于熊猫日期,我很难理解。我正在使用此中的自定义函数https://codereview.stackexchange.com/questions/135142/calculate-working-minutes-between-two-timestamps/135200#135200计算日期之间的分钟数但运气不佳的解决方案。

感谢您的帮助!

import datetime
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar
from pandas.tseries.offsets import CDay
class HolidayCalendar(AbstractHolidayCalendar):
rules =[Holiday('New Years Day',year=2016,month=1,day=1),
Holiday('Australia Day',year=2016,month=1,day=26),
Holiday('Good Friday',year=2016,month=3,day=25),
Holiday('Easter Monday',year=2016,month=3,day=28),
Holiday('ANZAC Day',year=2016,month=4,day=25),
Holiday('Queens Birthday',year=2016,month=6,day=13),
Holiday('Christmas Day',year=2016,month=12,day=25),
Holiday('Boxing Day',year=2016,month=12,day=26),           
Holiday('New Years Day',year=2017,month=1,day=1),
Holiday('Australia Day',year=2017,month=1,day=26),
Holiday('Good Friday',year=2017,month=4,day=15),
Holiday('Easter Monday',year=2017,month=4,day=17),
Holiday('ANZAC Day',year=2017,month=4,day=25),
Holiday('Queens Birthday',year=2017,month=6,day=12),
Holiday('Christmas Day',year=2017,month=12,day=25),
Holiday('Boxing Day',year=2017,month=12,day=26),
Holiday('New Years Day',year=2018,month=1,day=1),
Holiday('Australia Day',year=2018,month=1,day=26),
Holiday('Good Friday',year=2018,month=3,day=30),
Holiday('Easter Monday',year=2018,month=4,day=2),
Holiday('ANZAC Day',year=2018,month=4,day=25),
Holiday('Queens Birthday',year=2018,month=6,day=11),
Holiday('Christmas Day',year=2018,month=12,day=25),
Holiday('Boxing Day',year=2018,month=12,day=26)]
cal = HolidayCalendar()
dayindex = pd.bdate_range(datetime.date(2015,1,1),datetime.date.today(),freq=CDay(calendar=cal))
day_series = dayindex.to_series()
def count_mins(start,end):
starttime = datetime.datetime.fromtimestamp(int(start)/1000)
endtime = datetime.datetime.fromtimestamp(int(end)/1000)
days = day_series[starttime.date():endtime.date()]
daycount = len(days)
if daycount == 0:
return daycount
else:
startday = datetime.datetime(days[0].year,
days[0].month,
days[0].day,
hour=9,
minute=0)
endday = datetime.datetime(days[-1].year,
days[-1].month,
days[-1].day,
hour=17,
minute=0)
if daycount == 1:  
if starttime < startday:
periodstart = startday
else:
periodstart = starttime
if endtime > endday:
periodend = endday
else:
periodend = endtime
return (periodend - periodstart).seconds/60
if daycount == 2:
if starttime < startday:
first_day_mins = 480
else:
first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
if endtime > endday:
second_day_mins = 480
else:
second_day_mins = (endtime-endday.replace(hour=9)).seconds/60
return (first_day_mins + second_day_mins)
else:
if starttime < startday:
first_day_mins = 480
else:
first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
if endtime > endday:
second_day_mins = 480
else:
second_day_mins = (endtime-endday.replace(hour=9)).seconds/60
return (first_day_mins + second_day_mins + ((daycount-2)*480))

df_updated['Created Date'] = pd.to_datetime(df_updated['Created Date'])
df_updated['Updated Date'] = pd.to_datetime(df_updated['Updated Date'])
df_updated['Created Date'] = df_updated['Created Date'].astype(np.int64) / 
int(1e6)
df_updated['Updated Date'] = df_updated['Updated Date'].astype(np.int64) / 
int(1e6)
count_mins(df_updated['Created Date'], df_updated['Updated Date'])

在PyPi 中试用这个名为业务持续时间的包

pip安装业务持续时间

示例代码:

from business_duration import businessDuration
import pandas as pd
from datetime import time,datetime
import holidays as pyholidays
startdate = pd.to_datetime('2017-01-01 00:00:00')
enddate = pd.to_datetime('2017-01-31 23:00:00')
starttime=time(9,0,0)
endtime=time(17,0,0)
holidaylist = pyholidays.Australia()
unit='hour'
#By default weekends are Saturday and Sunday
print(businessDuration(startdate,enddate,starttime,endtime,holidaylist=holidayli
st,unit=unit))
Output: 160.0
holidaylist:
{datetime.date(2017, 1, 1): "New Year's Day",
datetime.date(2017, 1, 2): "New Year's Day (Observed)",
datetime.date(2017, 1, 26): 'Australia Day',
datetime.date(2017, 3, 6): 'Canberra Day',
datetime.date(2017, 4, 14): 'Good Friday',
datetime.date(2017, 4, 15): 'Easter Saturday',
datetime.date(2017, 4, 17): 'Easter Monday',
datetime.date(2017, 4, 25): 'Anzac Day',
datetime.date(2017, 6, 12): "Queen's Birthday",
datetime.date(2017, 9, 26): 'Family & Community Day',
datetime.date(2017, 10, 2): 'Labour Day',
datetime.date(2017, 12, 25): 'Christmas Day',
datetime.date(2017, 12, 26): 'Boxing Day'}

您可以使用bdate_range:的长度

In [11]: pd.bdate_range('2017-01-01', '2017-10-23')
Out[11]:
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
'2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
'2017-01-12', '2017-01-13',
...
'2017-10-10', '2017-10-11', '2017-10-12', '2017-10-13',
'2017-10-16', '2017-10-17', '2017-10-18', '2017-10-19',
'2017-10-20', '2017-10-23'],
dtype='datetime64[ns]', length=211, freq='B')
In [12]: len(pd.bdate_range('2017-01-01', '2017-10-23'))
Out[12]: 211

我建议一个更简单的解决方案

import pandas as pd  
from datetime import datetime
weekmask = 'Sun Mon Tue Wed Thu'
exclude = [pd.datetime(2020, 5, 1),
pd.datetime(2020, 5, 2),
pd.datetime(2020, 5, 3)]
pd.bdate_range('2020/4/30','2020/5/26',
freq='C',
weekmask = weekmask,
holidays=exclude)

最新更新