我有一个数据帧:
df = pd.DataFrame.from_items([('A', [10, 'foo']), ('B', [440, 'foo']), ('C', [790, 'bar']), ('D', [800, 'bar']), ('E', [7000, 'foo'])], orient='index', columns=['position', 'foobar'])
如下所示:
position foobar
A 10 foo
B 440 foo
C 790 bar
D 800 bar
E 7000 foo
我想知道每个仓位与foobar
列中具有相反值的下一个仓位之间的区别。通常我会使用shift
方法向下移动position
列:
df[comparisonCol].shift(-1) - df[comparisonCol]
但是由于我使用foobar
列来决定哪个职位适用,因此我不确定如何做到这一点。
结果应如下所示:
position foobar difference
A 10 foo 780
B 440 foo 350
C 790 bar 6210
D 800 bar 6200
E 7000 foo NaN
我认为您需要foobar
中的唯一值只有 2,因此a
系列中的组之间可能会发生偏移:
#identify consecutive groups
a = df['foobar'].ne(df['foobar'].shift()).cumsum()
print (a)
A 1
B 1
C 2
D 2
E 3
Name: foobar, dtype: int32
#get first value by a of position column
b = df.groupby(a)['position'].first()
print (b)
foobar
1 10
2 790
3 7000
Name: position, dtype: int64
#subtract mapped value, but for next group is added 1 to a Series
df['difference'] = a.add(1).map(b) - df['position']
print (df)
position foobar difference
A 10 foo 780.0
B 440 foo 350.0
C 790 bar 6210.0
D 800 bar 6200.0
E 7000 foo NaN
细节:
print (a.add(1).map(b))
A 790.0
B 790.0
C 7000.0
D 7000.0
E NaN
Name: foobar, dtype: float64