pandas/python:搜索2df之间的循环,并将最终结果输入到新的df



是否可以创建一个"使用两个不同的df(s)循环(在一个范围内,例如:iloc 0到10,0到21等),并将最终结果输入到一个新的df中。

下面的是我使用的手动方法。但是,效率非常低。

ndf0 = df[(df['result'] == rc.iloc[0]['result'])].drop_duplicates(['result'], keep='last')
ndf1 = df[(df['result'] == rc.iloc[1]['result'])].drop_duplicates(['result'], keep='last')
ndf2 = df[(df['result'] == rc.iloc[2]['result'])].drop_duplicates(['result'], keep='last')
ndf3 = df[(df['result'] == rc.iloc[3]['result'])].drop_duplicates(['result'], keep='last')
ndf4 = df[(df['result'] == rc.iloc[4]['result'])].drop_duplicates(['result'], keep='last')
ndf5 = df[(df['result'] == rc.iloc[5]['result'])].drop_duplicates(['result'], keep='last')
ndf6 = df[(df['result'] == rc.iloc[6]['result'])].drop_duplicates(['result'], keep='last')
ndf7 = df[(df['result'] == rc.iloc[7]['result'])].drop_duplicates(['result'], keep='last')
ndf8 = df[(df['result'] == rc.iloc[8]['result'])].drop_duplicates(['result'], keep='last')
..etc...
frames = [ndf0, ndf1, ndf2, ndf3, ndf4, ndf5, ndf6, ndf7, ndf8, etc..]
result = pd.concat(frames)

Many thanks, regards

示例表:

df

<>以前╔════════╦════════╗[au:╠════════╬════════╣紫色的蓝色的黄色的绿色的[au:棕色的白色的[au:[au:*黑色* 67 *╚════════╩════════╝rc

<>以前╔════════╗结果╠════════╣蓝色的黄色的红色的褐色的白色的╚════════╝ndf

<>以前╔════════╦════════╗[au:╠════════╬════════╣蓝色的黄色的[au:[au:白色的╚════════╩════════╝

使用merge:

>>> rc.merge(df, on='result', how='left').drop_duplicates('result', keep='last')
result  rating
0    blue      33
1  yellow      54
2     red      64
3     tan      47
4   white      95

注意:在你的样品中,你没有重复的。

您似乎想要在rc['result']包含df['result']的值的条件下从df中选择一部分数据。

如果是这样的话,也许你可以使用蒙版来选择它们,然后删除保留最后一行的重复项。

试试这个:

mask = (df['result'].isin(rc['result'].to_list()))
new_df = df[mask].drop_duplicates(['result'], keep='last')

希望它能给你一些帮助。

hmm....这不会直接起作用吗?

df = df[[rc.columns.tolist()]]

相关内容

最新更新