Pandas不带~和not in操作符



我有两个数据框架,如下所示

ID,Name,Sub,Country
1,ABC,ENG,UK
1,ABC,MATHS,UK
1,ABC,Science,UK
2,ABE,ENG,USA
2,ABE,MATHS,USA
2,ABE,Science,USA
3,ABF,ENG,IND
3,ABF,MATHS,IND
3,ABF,Science,IND
df1 = pd.read_clipboard(sep=',')
ID,Name,class,age
11,ABC,ENG,21
12,ABC,MATHS,23
1,ABC,Science,25
22,ABE,ENG,19
23,ABE,MATHS,22
24,ABE,Science,26
33,ABF,ENG,24
31,ABF,MATHS,28
32,ABF,Science,26
df2 = pd.read_clipboard(sep=',')

我想做下面的事情

a)检查df1中的IDNamedf2中是否存在。

b)如果df2中存在,则将Yes放在Status列中,或将No放在Status列中。不要使用~not in操作符,因为我的df2有数百万行。因此,它将导致不相关的结果

我试过下面的

ID_list = df1['ID'].unique().tolist()
Name_list = df1['Name'].unique().tolist()
filtered_df = df2[((df2['ID'].isin(ID_list)) & (df2['Name'].isin(Name_list)))]
filtered_df = filtered_df.groupby(['ID','Name','Sub']).size().reset_index()

上面的代码给出了df1df2之间匹配的id和名称。

但是我想找到df1中存在但df2中缺失/缺席的idsnames我不能使用~运算符,因为它将返回df2中没有df1匹配的所有行。在现实世界中,我的df2有数百万行。我只想找到missing df1 ids and names and put a status column

我希望我的输出如下所示

ID,Name,Sub,Country, Status
1,ABC,ENG,UK,No
1,ABC,MATHS,UK,No
1,ABC,Science,UK,Yes
2,ABE,ENG,USA,No
2,ABE,MATHS,USA,No
2,ABE,Science,USA,No
3,ABF,ENG,IND,No
3,ABF,MATHS,IND,No
3,ABF,Science,IND,No

预期输出为匹配3列:

m = df1.merge(df2, 
left_on=['ID','Name','Sub'],
right_on=['ID','Name','class'], 
indicator=True, how='left')['_merge'].eq('both')
df1['Status'] = np.where(m, 'Yes', 'No')
print (df1)
ID Name      Sub Country Status
0   1  ABC      ENG      UK     No
1   1  ABC    MATHS      UK     No
2   1  ABC  Science      UK    Yes
3   2  ABE      ENG     USA     No
4   2  ABE    MATHS     USA     No
5   2  ABE  Science     USA     No
6   3  ABF      ENG     IND     No
7   3  ABF    MATHS     IND     No
8   3  ABF  Science     IND     No

isin测试解决方案为:

idx1 = pd.MultiIndex.from_frame(df1[['ID','Name','Sub']])
idx2 = pd.MultiIndex.from_frame(df2[['ID','Name','class']]) 
df1['Status'] = np.where(idx1.isin(idx2), 'Yes', 'No')
print (df1)
ID Name      Sub Country Status
0   1  ABC      ENG      UK     No
1   1  ABC    MATHS      UK     No
2   1  ABC  Science      UK    Yes
3   2  ABE      ENG     USA     No
4   2  ABE    MATHS     USA     No
5   2  ABE  Science     USA     No
6   3  ABF      ENG     IND     No
7   3  ABF    MATHS     IND     No
8   3  ABF  Science     IND     No

因为如果匹配两列输出是不同的:

m = df1.merge(df2, on=['ID','Name'], indicator=True, how='left')['_merge'].eq('both')
df1['Status'] = np.where(m, 'Yes', 'No')
print (df1)
ID Name      Sub Country Status
0   1  ABC      ENG      UK    Yes
1   1  ABC    MATHS      UK    Yes
2   1  ABC  Science      UK    Yes
3   2  ABE      ENG     USA     No
4   2  ABE    MATHS     USA     No
5   2  ABE  Science     USA     No
6   3  ABF      ENG     IND     No
7   3  ABF    MATHS     IND     No
8   3  ABF  Science     IND     No

最新更新