我有两个类似的数据帧:
df1 = pd.DataFrame(data = {'col1' : ['finance', 'accounting'], 'col2' : ['f1', 'a1']})
df2 = pd.DataFrame(data = {'col1' : ['finance', 'finance', 'finance', 'accounting', 'accounting','IT','IT'], 'col2' : ['f1','f2','f3','a1,'a2','I1','I2']})
df1
col1 col2
0 finance f1
1 accounting a1
df2
col1 col2
0 finance f1
1 finance f2
2 finance f3
3 accounting a1
4 accounting a2
5 IT I1
6 IT I2
我想在col1上执行LEFT JOIN,在col2上执行ANTI JOIN。输出应该是这样的:
col1 col2
finance f2
finance f3
accounting a2
有人能帮助我如何在熊猫身上正确地做这件事吗。我在熊猫身上试过join
和merge
,但都不起作用。提前谢谢。
您可以merge
并过滤:
(df1.merge(df2, on='col1', suffixes=('_', None))
.loc[lambda d: d['col2'] != d.pop('col2_')]
)
输出:
col1 col2
1 finance f2
2 finance f3
4 accounting a2
只是为了好玩,这里有另一种方法(除了@mozway的真正优雅的解决方案(:
df2 = ( df2
.reset_index() # save index as column 'index'
.set_index('col1') # make 'col1' the index
.loc[df1.col1,:] # filter for 'col1' values in df1
.set_index('col2', append=True) # add 'col2' to the index
.drop(index=df1.
set_index(list(df1.columns))
.index) # create a multi-index from df1 and drop all matches from df2
.reset_index() # make 'col1' and col2' columns again
.set_index('index') # make 'index' the index again
.rename_axis(index=None) ) # make the index anonymous
输出:
col1 col2
1 finance f2
2 finance f3
4 accounting a2