我有一个示例datafram1
date username cities
2021-03-01 K John New york
2021-03-01 K John LA
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles LA
dataframe2包含
date username planned_cities
2021-03-01 K John Alabama
2021-03-02 K John LA
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles California
预期结果(仅考虑date username
,删除不在df1中的列)
date username planned_cities
2021-03-01 K John Alabama
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles California
由于2021-03-02 K John
不在df1的记录中,因此将其丢弃。我怎样才能做到这一点呢?
使用内部merge
删除重复,以确保您不会增长左侧DataFrame。
df2.merge(df1[['date', 'username']].drop_duplicates())
date username planned_cities
0 2021-03-01 K John Alabama
1 2021-03-02 Ken Miles Florida
2 2021-03-02 Ken Miles California
您可以对感兴趣的列使用Index.isin
,然后使用布尔索引:
cols = ['date','username']
idx1 = pd.MultiIndex.from_frame(df1[cols])
idx2 = pd.MultiIndex.from_frame(df2[cols])
out = df2[idx2.isin(idx1)]
date username planned_cities
2021-03-01 K John Alabama
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles California