Python 日期时间简化了 24 小时时间和总数据帧



我在Windows 64上使用Python 3.8.0

我有一个 24 小时时间的长字符串,我想将其总计相加。有关逻辑说明,请参阅下面的代码。我可以简化datetime模块的使用吗?

对于现有代码,可能会发生 hrs_finish <= hrs_start的情况。如果发生这种情况,我如何检查并添加 24 小时hrs_finish?有关当前脚本,请参见下文:

import re
import pandas as pd
#String extracted from pdf using PyPDF2
pdfstring = 'Utilities Dd Tmp No/s0417 937 023Equipment / Additional ChargesnCrew SpeciÞc InstructionsnBrisbane1/23 Darnick St, Underwood  QLD  4119nPhone: 07 3841  7773  Fax: 07 3841 2229nBerkley Heathn8:0017:450:30Yesn141YesnYesnTyson Trindalln8:3015:300:30YesnYesnYesnNaser Bin Khaleeln10:3015:300:30YesnYesnYesnBernard Macinnisn15:3017:450:00YesnYesnYesnTsz Ching Suenn15:3017:450:00YesnYesnYesnClient Not On Site - Authorisation Signaturen CLIENT NOT ON SITE'
#Finding hours from pdfstring using regex pattern of digits and ':' to read and locate
regexhours = re.compile(r'(dd*):(dd)(dd):(dd)(dd*):(dd)')
hours_sublist = regexhours.findall(pdfstring)
#converts listed strings into integers so can subtract
hrs1 = [list(map(int, x)) for x in hours_sublist]
#creates dataframe using pandas as pd
df = pd.DataFrame(hrs1)
print(df)
#How can I check hrs_finish <= hrs_start?
hrs_start = df[0]*60+df[1]
hrs_finish = df[2]*60+df[3]
#calculating total hours = Finish - Start times
df_total = (hrs_finish - hrs_start)/60
print(df_total)
#Total hours worked
Hours_worked = sum(df_total)
print(Hours_worked)

你可以这样做:

df['start'] = pd.to_datetime(df[0]*60 + df[1], unit='m')
df['finish'] = pd.to_datetime(df[2]*60 + df[3], unit='m')
(df['finish'] - df['start']).sum()
# Timedelta('1 days 02:15:00')

或者你可以得到几个小时:

(df['finish'] - df['start']).sum() / pd.Timedelta('1 hour')
# 26.25

编辑

要解决 24 小时差异问题,您可以使用:

(df['finish'] + (df['finish'] < df['start']).astype('timedelta64[D]') - df['start']).sum()

最新更新