import os
import pandas as pd[enter image description here][1]
import datetime
df = pd.read_excel("m2.xlsx") # Imported excel file
print(df) # Printed the data output.
每列中都有0到9之间的数字。excel工作表中的日期(以列为单位(和时间(以行为单位(。现在我想要8和0的值的时间差出现在下面的数据中
Dates 04:00 A.M(time) 08:00A.M(time)
13-08-2018 8 1
14-08-2018 3 4
15-08-2018 8 9
16/08/2018 7 7
17/08/2018 9 8
18/08/2018 5 5
19/08/2018 0 6
20/08/2018 6 3
21/08/2018 2 5
22/08/2018 0 6
期望的结果如下:(应显示值的时间差(值8:的时间差
Date Value Time difference
13-08-2018 8 0 hrs
15-08-2018 8 48 hrs
17-08-2018 8 100 hrs
值0:的时间差
Date Value Time difference
19/08/2018 0 0 hrs
22/08/2018 0 72 hrs
要在python中加载的示例数据:
sample_data = [["13-08-2018",8,1],
["14-08-2018",3,4],
["15-08-2018",8,9],
["16/08/2018",7,7],
["17/08/2018",9,8],
["18/08/2018",5,5],
["19/08/2018",0,6],
["20/08/2018",6,3],
["21/08/2018",2,5],
["22/08/2018",0,6]]
df = pd.DataFrame(data=sample_data, columns=["Dates","04:00 A.M(time)","08:00 A.M(time)"])
df["Dates"] = pd.to_datetime(df["Dates"])
IIUC,您需要这样的函数:
def Time_Difference(df,value):
df_melt = df.melt(id_vars='Date',value_vars=['04:00 A.M','08:00 A.M'])
df_melt['dummy_date'] = pd.to_datetime(df_melt['Date'].dt.strftime('%Y-%m-%d')+' '+
df_melt.variable.str.extract("(d+:d+)")[0])
df_melt['Time difference'] = 0
mask = (df_melt['value']==value)
df_melt.sort_values('dummy_date',inplace=True)
df_melt.loc[mask,'Time difference'] = (df_melt.loc[mask,'dummy_date'].diff().fillna(0)/
np.timedelta64(1,'h'))
.cumsum().astype(str)+' hrs'
return df_melt.loc[mask,['Date','value','Time difference']].reset_index(drop=True)
df = pd.DataFrame({'Date':['13-08-2018','14-08-2018'],'04:00 A.M':[8,8],'08:00 A.M':[8,8]})
df['Date'] = pd.to_datetime(df['Date'])
Time_Difference(df,8)
Date value Time difference
0 2018-08-13 8 0.0 hrs
1 2018-08-13 8 4.0 hrs
2 2018-08-14 8 24.0 hrs
3 2018-08-14 8 28.0 hrs
解释:
步骤1:使用熔体将['04:00 A.M','08:00 A.M']
柱制成单柱:
print(df.melt(id_vars='Date',value_vars=['04:00 A.M','08:00 A.M']))
Date variable value
0 2018-08-13 04:00 A.M 8
1 2018-08-14 04:00 A.M 8
2 2018-08-13 08:00 A.M 8
3 2018-08-14 08:00 A.M 8
步骤2:创建dummy_date
并添加variable
列的值:
df_melt['dummy_date'] = pd.to_datetime(df_melt['Date'].dt.strftime('%Y-%m-%d')+' '+
df_melt.variable.str.extract("(d+:d+)")[0])
print(df_melt['dummy_date'])
0 2018-08-13 04:00:00
1 2018-08-14 04:00:00
2 2018-08-13 08:00:00
3 2018-08-14 08:00:00
Name: dummy_date, dtype: datetime64[ns]
步骤3:对给定的value
进行过滤并获得掩码:
mask = (df_melt['value']==value)
print(mask)
0 True
1 True
2 True
3 True
Name: value, dtype: bool
第4步:最后,取差值并填充na值,然后除以timedelta64。在此之后,取累计和,将数据类型更改为对象,并将小时数添加到值:
df_melt.loc[mask,'Time difference'] = (df_melt.loc[mask,'dummy_date'].diff().fillna(0)/
np.timedelta64(1,'h'))
.cumsum().astype(str)+' hrs'
0 0.0 hrs
2 4.0 hrs
1 24.0 hrs
3 28.0 hrs
Name: Time difference, dtype: object