使用python从一个df中删除不在另一个df中的记录



我有一个示例datafram1

date           username         cities
2021-03-01     K John           New york
2021-03-01     K John           LA
2021-03-02     Ken Miles        Florida
2021-03-02     Ken Miles        LA

dataframe2包含

date          username        planned_cities 
2021-03-01    K John             Alabama
2021-03-02    K John             LA
2021-03-02    Ken Miles          Florida
2021-03-02    Ken Miles          California

预期结果(仅考虑date username,删除不在df1中的列)

date         username        planned_cities
2021-03-01    K John             Alabama
2021-03-02    Ken Miles          Florida
2021-03-02    Ken Miles          California

由于2021-03-02 K John不在df1的记录中,因此将其丢弃。我怎样才能做到这一点呢?

使用内部merge删除重复,以确保您不会增长左侧DataFrame。

df2.merge(df1[['date', 'username']].drop_duplicates())
date   username planned_cities
0  2021-03-01     K John        Alabama
1  2021-03-02  Ken Miles        Florida
2  2021-03-02  Ken Miles     California

您可以对感兴趣的列使用Index.isin,然后使用布尔索引:

cols = ['date','username']
idx1 = pd.MultiIndex.from_frame(df1[cols])
idx2 = pd.MultiIndex.from_frame(df2[cols])
out = df2[idx2.isin(idx1)]

date   username planned_cities
2021-03-01     K John        Alabama
2021-03-02  Ken Miles        Florida
2021-03-02  Ken Miles     California

最新更新