我有如下的数据框架df:
id no name percentage score result
0 0 0.30 pencils 0.84 0.974185 1
1 1 0.18 computer 1.14 1.0 1
2 2 0.27 laptop 1.32 1.0 1
0 1 0.84 vegetables 1.770008 0.99992 4
1 2 0.27 meat 1.85 1.0 1
0 1 0.84 vegetables 1.770008 0.99992 4
1 2 0.27 meat 1.32 1.0 1
2 1 0.84 vegetables 1.770008 0.99992 4
3 2 0.27 fruits 1.5 1.0 1
我想从index_chain中选择no的第一个值,从index_chain中选择percentage的最后一个值,如下所示,但具有原始索引df1:
no
0 0.30
0 0.84
0 0.84
df2:
percentage
2 1.32
1 1.85
3 1.5
连接如下
no percentage subtracted
0 0.30 1.32 1.02
0 0.84 1.85 1.01
0 0.84 1.5 0.66
I tried like
df1 = data2['no'][data2.index[0]]
df2 = data2['percntage'][data2.index[-1]]
并试图减去
subtracted = dataf2 - dataf1
导致所有nan值。
我正在尝试这个,但无法获得索引
尝试通过计数来检测块,然后groupby
:
(df.groupby(df.index.to_series().diff().lt(0).cumsum())
.agg({'no':'first', 'percentage':'last'})
.assign(subtracted=lambda x: x['percentage'] - x['no'])
)
输出:
no percentage subtracted
0 0.30 1.32 1.02
1 0.84 1.85 1.01
2 0.84 1.50 0.66
另一种更简单的方法:
lowest = df.at[0, 'no'] # take all lowest values, based on index
# find maximum values based on ID
mx = df['ID'].max()
highest = df['percentage'][df['ID'] == mx]
# Create an empty dataframe, and pass highest and lowset.
output = pd.DataFrame()
output['no'] = lowest #
output['percentage'] = highest
output['subtracted'] = output['percentage'] - output['no'] # also do the subtraction