我有一个包含 6 列的数据帧,如下所示:
c1 c2 c3 c4 c5 c6
C875 DOID_3263 1 9.65E-18 1 unknown
C783 DOID_4064 1 4.80E-17 1 unknown
C372 DOID_0050084 0.996 0.00429 0.996 unknown
C43 DOID_936 0.0457 0.954 0.954 known
列c5
表示c3 and c4
之间的最大值,我想添加一个列after c6
来比较c5
中的最大值是否来自c3 put 0
是否来自c4 put 1
所以,最后的最终结果将是这样的:
c1 c2 c3 c4 c5 c6 c7
C875 DOID_3263 1 9.65E-18 1 known 0
C783 DOID_4064 1 4.80E-17 1 unknown 0
C372 DOID_0050084 0.996 0.00429 0.996 unknown 0
C43 DOID_936 0.0457 0.954 0.954 known 1
有什么帮助吗?
首先,取两列的最大值
df['c5'] = np.maximum(df['c3'], df['c4'])
如果最大值等于"c4",则输入 1,否则输入 0(这意味着它来自此范式下的"c3"(。
df['c7'] = (df['c5'] == df['c4']).astype(int)
使用idxmax
会立即生成列的名称
df[['c3','c4']].idxmax(1)
0 c3
1 c3
2 c3
3 c4
dtype: object
如果需要0
或1
,可以随时映射
df[['c3','c4']].idxmax(1).map({'c3': 0, 'c4':1})
0 0
1 0
2 0
3 1
dtype: int64
使用numpy
中的select
s1=df.c3==df.c5
s2=df.c4==df.c5
df['c7']=np.select([s1,s2],[0,1])
df
Out[670]:
c1 c2 c3 c4 c5 c6 c7
0 C875 DOID_3263 1.0000 9.650000e-18 1.000 unknown 0
1 C783 DOID_4064 1.0000 4.800000e-17 1.000 unknown 0
2 C372 DOID_0050084 0.9960 4.290000e-03 0.996 unknown 0
3 C43 DOID_936 0.0457 9.540000e-01 0.954 known 1