价值观的时差?Pandas数据帧


import os
import pandas as pd[enter image description here][1]
import datetime
df = pd.read_excel("m2.xlsx") # Imported excel file 
print(df)   # Printed the data output.

每列中都有0到9之间的数字。excel工作表中的日期(以列为单位(和时间(以行为单位(。现在我想要8和0的值的时间差出现在下面的数据中

Dates            04:00 A.M(time)      08:00A.M(time)
13-08-2018           8                   1                
14-08-2018           3                   4                
15-08-2018           8                   9                
16/08/2018           7                   7                
17/08/2018           9                   8                
18/08/2018           5                   5                
19/08/2018           0                   6                  
20/08/2018           6                   3                 
21/08/2018           2                   5               
22/08/2018           0                   6   

期望的结果如下:(应显示值的时间差(值8:的时间差

Date        Value    Time difference
13-08-2018    8         0 hrs 
15-08-2018    8        48 hrs
17-08-2018    8        100 hrs

值0:的时间差

Date        Value     Time difference
19/08/2018    0         0 hrs
22/08/2018    0         72 hrs 

要在python中加载的示例数据:

sample_data = [["13-08-2018",8,1],
["14-08-2018",3,4],
["15-08-2018",8,9],
["16/08/2018",7,7],
["17/08/2018",9,8],
["18/08/2018",5,5],
["19/08/2018",0,6],
["20/08/2018",6,3],
["21/08/2018",2,5],
["22/08/2018",0,6]]
df = pd.DataFrame(data=sample_data, columns=["Dates","04:00 A.M(time)","08:00 A.M(time)"])
df["Dates"] = pd.to_datetime(df["Dates"])

IIUC,您需要这样的函数:

def Time_Difference(df,value):
df_melt = df.melt(id_vars='Date',value_vars=['04:00 A.M','08:00 A.M'])
df_melt['dummy_date'] = pd.to_datetime(df_melt['Date'].dt.strftime('%Y-%m-%d')+' '+
df_melt.variable.str.extract("(d+:d+)")[0])
df_melt['Time difference'] = 0
mask = (df_melt['value']==value)
df_melt.sort_values('dummy_date',inplace=True)
df_melt.loc[mask,'Time difference'] = (df_melt.loc[mask,'dummy_date'].diff().fillna(0)/
np.timedelta64(1,'h'))
.cumsum().astype(str)+' hrs'
return df_melt.loc[mask,['Date','value','Time difference']].reset_index(drop=True)
df = pd.DataFrame({'Date':['13-08-2018','14-08-2018'],'04:00 A.M':[8,8],'08:00 A.M':[8,8]})
df['Date'] = pd.to_datetime(df['Date'])
Time_Difference(df,8)
Date  value Time difference
0 2018-08-13      8         0.0 hrs
1 2018-08-13      8         4.0 hrs
2 2018-08-14      8        24.0 hrs
3 2018-08-14      8        28.0 hrs

解释:

步骤1:使用熔体将['04:00 A.M','08:00 A.M']柱制成单柱:

print(df.melt(id_vars='Date',value_vars=['04:00 A.M','08:00 A.M']))
Date   variable  value
0 2018-08-13  04:00 A.M      8
1 2018-08-14  04:00 A.M      8
2 2018-08-13  08:00 A.M      8
3 2018-08-14  08:00 A.M      8

步骤2:创建dummy_date并添加variable列的值:

df_melt['dummy_date'] = pd.to_datetime(df_melt['Date'].dt.strftime('%Y-%m-%d')+' '+
df_melt.variable.str.extract("(d+:d+)")[0])
print(df_melt['dummy_date'])
0   2018-08-13 04:00:00
1   2018-08-14 04:00:00
2   2018-08-13 08:00:00
3   2018-08-14 08:00:00
Name: dummy_date, dtype: datetime64[ns]

步骤3:对给定的value进行过滤并获得掩码:

mask = (df_melt['value']==value)
print(mask)
0    True
1    True
2    True
3    True
Name: value, dtype: bool

第4步:最后,取差值并填充na值,然后除以timedelta64。在此之后,取累计和,将数据类型更改为对象,并将小时数添加到值:

df_melt.loc[mask,'Time difference'] = (df_melt.loc[mask,'dummy_date'].diff().fillna(0)/
np.timedelta64(1,'h'))
.cumsum().astype(str)+' hrs'
0     0.0 hrs
2     4.0 hrs
1    24.0 hrs
3    28.0 hrs
Name: Time difference, dtype: object

相关内容

最新更新