熊猫获取具有多个索引的前一行时间序列



我有一个数据帧,索引中有两列 - 一列是标签,另一列是时间序列周期。我想获取时间序列中每一行的前一行。但是我不能使用 DataFrame.shift(),因为索引中有 2 列,并且 shift 混淆了标签。

#Desired behavior: each 'x' row needs its prev value, each 'y' row needs
#its prev value, etc. DON'T put the 'y' row's prev value on the 'x' row.
#Have to respect both columns on the index when shifting.
x = pandas.DataFrame({ 'label' : [ 'x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z' ], 
     'period' : [ 1, 1, 1, 2, 2, 2, 3, 3, 3 ],
     'value' : [ '1st x', '1st y', '1st z', '2nd x', '2nd y', '2nd z', '3rd x', '3rd y', '3rd z' ]})
x.set_index(['label', 'period'], inplace=True)
#That looks like:
>>> x
             value
label period       
x     1       1st x
y     1       1st y
z     1       1st z
x     2       2nd x
y     2       2nd y
z     2       2nd z
x     3       3rd x
y     3       3rd y
z     3       3rd z
#I can't use x.shift(1) because that mixes the 'x' and 'y' values:
>>> x.shift(1)
              value
label period       
x     1         NaN
y     1       1st x ###WRONG! should be NaN
z     1       1st y ###WRONG! Should be Nan
x     2       1st z  ###WRONG!!! This should be "1st x'
y     2       2nd x  ###Wrong!! Should be '1st y'
z     2       2nd y ###Wrong!! Should be '1st z'
x     3       2nd z  ###Wrong!! Should be '2nd x'
y     3       3rd x  #WRONG! should be '2nd y'
z     3       3rd y #WRONG! should be '2nd z'

如何为每一行获取正确的上一行?

如果您按第一个索引级别groupby,则shift按预期工作:

In [42]:
x.groupby(level='label').shift()
Out[42]:
              value
label period       
x     1         NaN
y     1         NaN
z     1         NaN
x     2       1st x
y     2       1st y
z     2       1st z
x     3       2nd x
y     3       2nd y
z     3       2nd z

此外,如果您希望它采用更"可读"的格式,您可以使用DataFrame.unstack

unstacked = df.unstack(level=0)
changes = unstacked.diff()

其中对于以下数据:

label period  value    
x     1       1
y     1       0
z     1       3
x     2       2
y     2       1
z     2       2
x     3       1
y     3       0
z     3       0

生产:

    value
label   x   y   z
period          
1   NaN     NaN     NaN
2   1.0     1.0     -1.0
3   -1.0    -1.0    -2.0

最新更新