Pandas:以分钟为单位计算两列之间的时间,不包括周末、公共假日和营业时间



我有以下问题,我觉得离解决它只有几步之遥,但我还没有足够的经验。我用了营业期限。我已经看过其他类似的答案,并尝试了很多方法,但这是我得到的最接近的答案(使用这个答案(。我正在使用Anaconda和Spyder,这是我目前工作笔记本电脑上唯一的方法。我无法将一些自定义的工作日函数安装到anaconda中

我有一个大数据集(约20万行(,我需要解决这个问题:

import pandas as pd
import business_duration as bd
import datetime as dt
import holidays as pyholidays
#Specify Business Working hours (8am - 5pm)
Bus_start_time = dt.time(8,00,0)
Bus_end_time = dt.time(17,0,0)
holidaylist = pyholidays.ZA()
unit='min'
list = [[10, '2022-01-01 07:00:00', '2022-01-08 15:00:00'], [11, '2022-01-02 18:00:00', '2022-01-10 15:30:00'],
[12, '2022-01-01 09:15:00', '2022-01-08 12:00:00'], [13, '2022-01-07 13:00:00', '2022-01-23 17:00:00']]
df = pd.DataFrame(list, columns =['ID', 'Start', 'End'])
print(df)

哪个给出:

ID      Start                  End 
0  10  2022-01-01 07:00:00  2022-01-08 15:00:00 
1  11  2022-01-02 18:00:00  2022-01-10 15:30:00 
2  12  2022-01-01 09:15:00  2022-01-08 12:00:00 
3  13  2022-01-07 13:00:00  2022-01-23 17:00:00

下一步是测试单个日期:

startdate = pd.to_datetime('2022-01-01 00:00:00')
enddate = pd.to_datetime('2022-01-14 23:00:00')
df['TimeAdj'] = bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit)
print(df)

结果是:

ID        Start                  End            TimeAdj 
0  10  2022-01-01 07:00:00  2022-01-08 15:00:00    5400.0 
1  11  2022-01-02 18:00:00  2022-01-10 15:30:00    5400.0 
2  12  2022-01-01 09:15:00  2022-01-08 12:00:00    5400.0 
3  13  2022-01-07 13:00:00  2022-01-23 17:00:00    5400.0

出于某种原因,我显示了浮点值,但我可以稍后修复。接下来,我需要在数据帧中的每一行运行此计算。

我尝试替换开始日期和结束日期中的df列,但出现了一个错误:

startdate = df['Start']
enddate = df['End']
print(bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit))`

ValueError:序列的真值不明确。使用a.empty、a.bool((、a.item((、.any((或.all((。

然后我检查了业务持续时间的文档,并调整为以下内容:

from itertools import repeat
df['TimeAdj'] = list(map(bd.businessDuration,startdate,enddate,repeat(Bus_start_time),repeat(Bus_end_time),repeat(holidaylist),repeat(unit)))
属性错误:"str"对象没有属性"date"

我希望在TimeAdj列的每一行中都能得到正确的值(添加了示例图(

ID        Start                  End           TimeAdj 
0  10  2022-01-01 07:00:00  2022-01-08 15:00:00    2300 
1  11  2022-01-02 18:00:00  2022-01-10 15:30:00    2830 
2  12  2022-01-01 09:15:00  2022-01-08 12:00:00    2115 
3  13  2022-01-07 13:00:00  2022-01-23 17:00:00    4800

我需要对此进行什么调整?

使用:

from functools import partial
# Convert strings to datetime
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Get holidays list
years = range(df['Start'].min().year, df['End'].max().year+1)
holidaylist = pyholidays.ZA(years=years).keys()
# Create a partial function as a shortcut
bduration = partial(bd.businessDuration,
starttime=Bus_start_time, endtime=Bus_end_time,
holidaylist=holidaylist, unit=unit)
# Compute business duration
df['TimeAdj'] = df.apply(lambda x: bduration(x['Start'], x['End']), axis=1)

输出:

>>> df
ID               Start                 End  TimeAdj
0  10 2022-01-01 07:00:00 2022-01-08 15:00:00   2700.0
1  11 2022-01-02 18:00:00 2022-01-10 15:30:00   3150.0
2  12 2022-01-01 09:15:00 2022-01-08 12:00:00   2700.0
3  13 2022-01-07 13:00:00 2022-01-23 17:00:00   5640.0

相关内容

最新更新