是否可以创建一个"使用两个不同的df(s)循环(在一个范围内,例如:iloc 0到10,0到21等),并将最终结果输入到一个新的df中。
下面的是我使用的手动方法。但是,效率非常低。
ndf0 = df[(df['result'] == rc.iloc[0]['result'])].drop_duplicates(['result'], keep='last')
ndf1 = df[(df['result'] == rc.iloc[1]['result'])].drop_duplicates(['result'], keep='last')
ndf2 = df[(df['result'] == rc.iloc[2]['result'])].drop_duplicates(['result'], keep='last')
ndf3 = df[(df['result'] == rc.iloc[3]['result'])].drop_duplicates(['result'], keep='last')
ndf4 = df[(df['result'] == rc.iloc[4]['result'])].drop_duplicates(['result'], keep='last')
ndf5 = df[(df['result'] == rc.iloc[5]['result'])].drop_duplicates(['result'], keep='last')
ndf6 = df[(df['result'] == rc.iloc[6]['result'])].drop_duplicates(['result'], keep='last')
ndf7 = df[(df['result'] == rc.iloc[7]['result'])].drop_duplicates(['result'], keep='last')
ndf8 = df[(df['result'] == rc.iloc[8]['result'])].drop_duplicates(['result'], keep='last')
..etc...
frames = [ndf0, ndf1, ndf2, ndf3, ndf4, ndf5, ndf6, ndf7, ndf8, etc..]
result = pd.concat(frames)
Many thanks, regards
示例表:
df
<>以前╔════════╦════════╗[au:╠════════╬════════╣紫色的蓝色的黄色的绿色的[au:棕色的白色的[au:[au:*黑色* 67 *╚════════╩════════╝rc
<>以前╔════════╗结果╠════════╣蓝色的黄色的红色的褐色的白色的╚════════╝ndf
<>以前╔════════╦════════╗[au:╠════════╬════════╣蓝色的黄色的[au:[au:白色的╚════════╩════════╝使用merge
:
>>> rc.merge(df, on='result', how='left').drop_duplicates('result', keep='last')
result rating
0 blue 33
1 yellow 54
2 red 64
3 tan 47
4 white 95
注意:在你的样品中,你没有重复的。
您似乎想要在rc['result']
包含df['result']
的值的条件下从df
中选择一部分数据。
如果是这样的话,也许你可以使用蒙版来选择它们,然后删除保留最后一行的重复项。
试试这个:
mask = (df['result'].isin(rc['result'].to_list()))
new_df = df[mask].drop_duplicates(['result'], keep='last')
希望它能给你一些帮助。
hmm....这不会直接起作用吗?
df = df[[rc.columns.tolist()]]