根据索引从一列中选择第一个值,从第二列中选择最后一个值



我有如下的数据框架df:

id  no      name        percentage  score       result
0   0   0.30    pencils     0.84        0.974185    1
1   1   0.18    computer    1.14        1.0         1
2   2   0.27    laptop      1.32        1.0         1
0   1   0.84    vegetables  1.770008    0.99992     4
1   2   0.27    meat        1.85        1.0         1
0   1   0.84    vegetables  1.770008    0.99992     4
1   2   0.27    meat        1.32        1.0         1
2   1   0.84    vegetables  1.770008    0.99992     4
3   2   0.27    fruits        1.5        1.0        1

我想从index_chain中选择no的第一个值,从index_chain中选择percentage的最后一个值,如下所示,但具有原始索引df1:

no     
0   0.30   
0   0.84   
0   0.84 

df2:

percentage      
2    1.32    
1    1.85    
3    1.5

连接如下

no     percentage subtracted
0   0.30   1.32       1.02
0   0.84   1.85       1.01
0   0.84   1.5        0.66

I tried like

df1 = data2['no'][data2.index[0]]
df2 = data2['percntage'][data2.index[-1]]

并试图减去

subtracted = dataf2 - dataf1

导致所有nan值。

我正在尝试这个,但无法获得索引

尝试通过计数来检测块,然后groupby:

(df.groupby(df.index.to_series().diff().lt(0).cumsum())
.agg({'no':'first', 'percentage':'last'})
.assign(subtracted=lambda x: x['percentage'] - x['no'])
)

输出:

no  percentage  subtracted
0  0.30        1.32        1.02
1  0.84        1.85        1.01
2  0.84        1.50        0.66

另一种更简单的方法:

lowest = df.at[0, 'no'] # take all lowest values, based on index
# find maximum values based on ID
mx = df['ID'].max()
highest = df['percentage'][df['ID'] == mx]
# Create an empty dataframe, and pass highest and lowset.
output = pd.DataFrame()
output['no'] = lowest # 
output['percentage'] = highest
output['subtracted'] = output['percentage'] - output['no'] # also do the subtraction

相关内容

  • 没有找到相关文章

最新更新