根据其他日期时间间隔对日期时间间隔进行分类,并将值放在列上



我正在处理一个艰巨的挑战,我不知道如何解决它。

我有一个这样的数据帧:

Product_Name        Start_Time               End_Time
Product X      2021-10-20 20:32:00     2021-10-21 03:50:00 
Product Y      2021-10-21 11:50:00     2021-10-21 16:00:00 
Product Z      2022-01-11 20:10:00     2022-01-12 15:30:00

我有 3 个范围时间和每个范围和一个类别:

A: 05:01 to 14:00
B: 14:01 to 22:00
C: 22:01 to 05:00

我想做的是根据"Start_Time"和"End_Time"计算每个类别(A、B 和 C)有多少十进制小时数,达到这样的一些:

Product_Name        Start_Time               End_Time           A        B       C
Product X      2021-10-20 20:30:00     2021-10-21 03:50:00    0.00     1.50    5.82
Product Y      2021-10-21 11:50:00     2021-10-21 16:00:00    2.17     1.98    0.00
Product Z      2022-01-11 20:10:00     2022-01-12 15:30:00    8.98     3.31    6.98

你们能帮我怎么做吗?

我是python,pandas等的初学者,当我第一次写这篇文章时,我真的不知道如何开始编码它。 所以我开始思考一些事情,我得到了这段代码,我确定这是不对的,但我认为这是某件事的开始:

start_a = 05:01:00
end_a = 14:00:00
start_b = 14:01:00
end_b = 22:00:00
start_c = 22:01:00
end_c = 05:00:00

if df['Start_Time'] > start_a and df['End_Time'] < end_a:
df['A'] = ( df['End_Time'] - start_a ) - ( end_a - df['Start_Time'] )   
else:
df['A'] = 0

if df['Start_Time'] > start_b and df['End_Time'] < end_b:
df['B'] = ( df['End_Time'] - start_b ) - ( end_b - df['Start_Time'] )   
else:
df['B'] = 0

if df['Start_Time'] > start_c and df['End_Time'] < end_c:
df['C'] = ( df['End_Time'] - start_c ) - ( end_c - df['Start_Time'] )   
else:
df['C'] = 0

你的问题比我想象的要困难得多。必须注意的一件事是,Start_TimeEnd_Time可以有不同的日期。此外,C类跨越两天。这两个事实使代码有点复杂,但它似乎有效。

首先,针对您的问题的设置。我创建了您的数据框并创建了变量。重要的是这些结构具有正确的数据类型。

import pandas as pd
from io import StringIO
from datetime import datetime, time, date, timedelta
# Create your data frame
data = StringIO("""Product_Name     Start_Time     End_Time
Product X      2021-10-20 20:32:00     2021-10-21 03:50:00 
Product Y      2021-10-21 11:50:00     2021-10-21 16:00:00 
Product Z      2022-01-11 20:10:00     2022-01-12 15:30:00""")
df = pd.read_csv(data, sep='     ', engine='python')
# Convert the columns to date time format
df[["Start_Time", "End_Time"]] = df[["Start_Time", "End_Time"]].apply(pd.to_datetime)
# Create the range start and end time as datetime format
start_a = datetime.strptime('05:01:00', '%H:%M:%S')
end_a = datetime.strptime('14:00:00', '%H:%M:%S')
start_b = datetime.strptime('14:01:00', '%H:%M:%S')
end_b = datetime.strptime('22:00:00', '%H:%M:%S')
start_c = datetime.strptime('22:01:00', '%H:%M:%S')
end_c = datetime.strptime('05:00:00', '%H:%M:%S')

然后,我创建了一个函数,可以计算您问题的小时数。startend是为一个范围定义的时间。该函数现在遍历天数,并查看您的范围适合多少。通常,它只需要一次迭代,但您的Product Z需要两天,因此需要两次迭代。

def calc_hours(start_time, end_time, start, end):

# Set range to have date also => allows us to compare to start_time and end_time
range_start = datetime.combine(start_time.date(), start.time())
range_end = datetime.combine(start_time.date(), end.time())
# Special case for range C as end of range is on the next day
if (range_end<range_start):
range_end = range_end + timedelta(days=1)
# start_time and end_time can go over one or more days => Iterate over the days and sum the ours in the range
total_hours=0.0
while (range_start < end_time):
# Calculation to get the hours or zero if range is not within time frame
hours_in_frame = max((min(range_end, end_time) - max(range_start, start_time)).total_seconds(), 0)/3600
total_hours += hours_in_frame

# Increment the day to check if range is in time frame
range_start = range_start + timedelta(days=1)
range_end = range_end + timedelta(days=1)

return total_hours

为了使用该函数并将结果添加到数据帧,我使用了函数apply()frompandas.该apply()获取数据帧的每一行,并使用前面显示的函数计算某个范围内的小时数。这是针对所有三个范围完成的。

# Use apply to calculate the hours for each row and each range
df['A'] = df.apply(lambda x: calc_hours(x['Start_Time'], x['End_Time'], start_a, end_a), axis=1)
df['B'] = df.apply(lambda x: calc_hours(x['Start_Time'], x['End_Time'], start_b, end_b), axis=1)
df['C'] = df.apply(lambda x: calc_hours(x['Start_Time'], x['End_Time'], start_c, end_c), axis=1)

输出几乎是您想要的,但不能四舍五入到小数点后两位:

Product_Name    Start_Time  End_Time    A   B   C
0   Product X   2021-10-20 20:32:00 2021-10-21 03:50:00 0.000000    1.466667    5.816667
1   Product Y   2021-10-21 11:50:00 2021-10-21 16:00:00 2.166667    1.983333    0.000000
2   Product Z   2022-01-11 20:10:00 2022-01-12 15:30:00 8.983333    3.316667    6.983333

另一种方法是创建一个系列,其中包含相关范围的所有序列号分钟数,然后将它们相交以获得重叠的持续时间。

没有时间提供完整的答案,但我想我会放弃这个想法,你可以从那里开始。

创建参考系列:

start = pd.Timestamp('22:01')
end = pd.Timestamp('05:00')
if end < start: 
end +=  pd.Timedelta(days=1)
drC = pd.Series(pd.date_range(start=start, end=end, freq='min')).dt.hour * 60 + 
pd.Series(pd.date_range(start=start, end=end, freq='min')).dt.minute

创建一个函数来执行交集和持续时间计算:

def intersecting_duration(x):
min_of_day = pd.Series(pd.Series(pd.date_range(start=x['Start_Time'], end=x['End_Time'], freq='min')).dt.hour * 60 + 
pd.Series(pd.date_range(start=x['Start_Time'], end=x['End_Time'], freq='min')).dt.minute)

dur_mins = len(np.intersect1d(min_of_day, drC))       
return 0 if (dur_mins == 0) else (dur_mins-1)/60

然后应用它:

df.apply(intersecting_duration, axis=1)
0    5.816667
1    0.000000
2    6.983333

你需要从那里拿走它。

最新更新