移位结果解释



我想添加一个新列Datadiff,它计算数据帧df:的相邻Data行中的差异

Id  Timestamp               Data       Timediff Datadiff
696     697 2013-08-12 10:35:47.287 30.0        0.510   -1.0
885     886 2013-08-12 10:37:35.850 30.5        -0.203  5.0
886     887 2013-08-12 10:37:36.373 31.5        0.523   1.0
917     918 2013-08-12 10:37:45.137 31.5        -0.510  34.5
1018   1019 2013-08-12 11:17:13.570 25.0        0.000   0.0
1357   1358 2013-08-12 12:42:21.280 25.0        -0.347  28.0

使用代码:

df['Timediff']= (df['Timestamp']-df['Timestamp'].shift(1)).dt.total_seconds()
df['Datadiff']= (df['Data']-df['Data'].shift(1))
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df[df['Data']>0]
df.head(500)

但是列Datadiff看起来很奇怪。轮班(1(是如何工作的?怎么了?

您需要重置索引,然后应用diff((运算符:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.reset_index()
df['Timediff']= df['Timestamp'].diff().dt.total_seconds()
df['Datadiff']= df['Data'].diff()

对于我来说,工作正常,通过差异比较返回相同的输出:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df[df['Data']>0]
df['Timediff1']= (df['Timestamp']-df['Timestamp'].shift(1)).dt.total_seconds()
df['Timediff2']= df['Timestamp'].diff().dt.total_seconds()
df['Datadiff1']= (df['Data']-df['Data'].shift(1))
df['Datadiff2']= df['Data'].diff()
print (df)
Id               Timestamp  Data  Timediff  Datadiff  Timediff1  
696    697 2013-08-12 10:35:47.287  30.0     0.510      -1.0        NaN   
885    886 2013-08-12 10:37:35.850  30.5    -0.203       5.0    108.563   
886    887 2013-08-12 10:37:36.373  31.5     0.523       1.0      0.523   
917    918 2013-08-12 10:37:45.137  31.5    -0.510      34.5      8.764   
1018  1019 2013-08-12 11:17:13.570  25.0     0.000       0.0   2368.433   
1357  1358 2013-08-12 12:42:21.280  25.0    -0.347      28.0   5107.710   
Timediff2  Datadiff1  Datadiff2  
696         NaN        NaN        NaN  
885     108.563        0.5        0.5  
886       0.523        1.0        1.0  
917       8.764        0.0        0.0  
1018   2368.433       -6.5       -6.5  
1357   5107.710        0.0        0.0  

最新更新