我有一个数据框架,需要根据两个独立列上基于值的条件映射类别。要执行此操作的总行数约为一百万。
示例数据帧为:
df = pd.DataFrame({'col1':['B','A','A','B','C','B','C','C','A'],
'col2':[10,30,40,20,60,30,70,80,50]})
现在,True的条件是:
- A:>30
- B: >20
- C: >60
如果col2中的值符合上述条件,则结果为True(1(,否则为False(0(。
预期结果为:
col1 col2 result
0 B 10 0
1 A 30 0
2 A 40 1
3 B 20 1
4 C 60 0
5 B 30 1
6 C 70 1
7 C 80 1
8 A 50 1
您可以通过|
为逐位OR
:链接掩码
df['result'] = (df['col1']=='A') & (df['col2']>30) |
(df['col1']=='B') & (df['col2']>10) |
(df['col1']=='C') & (df['col2']>60)
或者:
df['result'] = np.where((df['col1']=='A') & (df['col2']>30) |
(df['col1']=='B') & (df['col2']>10) |
(df['col1']=='C') & (df['col2']>60), 1, 0)
我试着这样做:
df['result'] = np.select([(df['col1']=='A') & (df['col2']>30),
(df['col1']=='A') & (df['col2']<=30),
(df['col1']=='B') & (df['col2']>10),
(df['col1']=='B') & (df['col2']<=10),
(df['col1']=='C') & (df['col2']>60),
(df['col1']=='C') & (df['col2']<=60),
],
[True,
False,
True,
False,
True,
False
]
)
然而,我不知道这是否是最好的方法。欢迎其他答案。