我有两个列中名称相同但大小不同的数据帧:
DF1形状(10x3(
Date Client ID
---------------------------
12-03-2020 Prada AAA
22-04-2020 Coutine BBB
02-02-2020 MarioG CCC
15-11-2020 Sublime DDD
19-08-2020 Sublime EEE
23-04-2020 Prada FFF
30-07-2020 MarioG GGG
11-10-2020 MarioG HHH
07-03-2020 Prada III
06-01-2020 Prada JJJ
DF2形状(5x3(
Date Client ID
---------------------------
17-03-2020 MarioG CCC
25-05-2020 Sublime EEE
04-02-2020 Prada AAA
15-10-2020 Sublime DDD
30-08-2020 Coutine BBB
我需要的是在DF1中生成一个名为";状态";,上面写着";Y";或";N〃;DF2的ID存在,例如,结果将是:
DF1
Date Client ID Status
------------------------------------
12-03-2020 Prada AAA Y
22-04-2020 Coutine BBB Y
02-02-2020 MarioG CCC Y
15-11-2020 Sublime DDD Y
19-08-2020 Sublime EEE Y
23-04-2020 Prada FFF N
30-07-2020 MarioG GGG N
11-10-2020 MarioG HHH N
07-03-2020 Prada III N
06-01-2020 Prada JJJ N
我尝试过以下几种:
DF1["Status"] = ["Y" if DF1["ID"].values == DF2["ID"].values else "N" for x in DF1["ID"]]
但它给了我数据帧长度或尺寸的错误。
有什么治疗这个问题的建议吗?
谢谢。
使用以下内容创建数据:
s1 = '''
Date Client ID
12-03-2020 Prada AAA
22-04-2020 Coutine BBB
02-02-2020 MarioG CCC
15-11-2020 Sublime DDD
19-08-2020 Sublime EEE
23-04-2020 Prada FFF
30-07-2020 MarioG GGG
11-10-2020 MarioG HHH
07-03-2020 Prada III
06-01-2020 Prada JJJ
'''
s2 = '''
Date Client ID
17-03-2020 MarioG CCC
25-05-2020 Sublime EEE
04-02-2020 Prada AAA
15-10-2020 Sublime DDD
30-08-2020 Coutine BBB
'''
df1 = pd.read_csv(io.StringIO(s1), sep=r's{2,}')
df2 = pd.read_csv(io.StringIO(s2), sep=r's{2,}')
为Status
添加一列将创建所需的结果:
>>> df1['Status'] = df1['ID'].isin(df2['ID']).replace({True: 'Y', False: 'N'})
>>> df1
Date Client ID Status
0 12-03-2020 Prada AAA Y
1 22-04-2020 Coutine BBB Y
2 02-02-2020 MarioG CCC Y
3 15-11-2020 Sublime DDD Y
4 19-08-2020 Sublime EEE Y
5 23-04-2020 Prada FFF N
6 30-07-2020 MarioG GGG N
7 11-10-2020 MarioG HHH N
8 07-03-2020 Prada III N
9 06-01-2020 Prada JJJ N
您的初始解决方案对列表理解进行了微小的更改(我在评论中建议(,也产生了正确的结果:
>>> np.array_equal(df1['Status'],
... ['Y' if s in df2['ID'].values else 'N' for s in df1['ID'].values])
True