基于多个列值创建一个新列



如果重复了,我非常抱歉。我已经搜索了很长时间,仍然得到错误:"TypeError: _select_dispatcher()得到一个意外的关键字参数'na'"或"TypeError: invalid entry 0 in condlist:应该是boolean "

我有一个数据框架:

data_1 = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2"],
'B': [0, 0, 0, 0, '', 1, 2, 3, 4, '', 4, 2, 3, 1],'C': [0, 2, 1, 3, '', 4, 2, 3, 1, '', 4, 2, 3, 1]}
df_1 = pd.DataFrame(data_1)
df_1
A   Pos B   C
0   Emo/3   repeat3 0   0
1   Emo/4   repeat3 0   2
2   Emo/1   repeat3 0   1
3   Emo/3   repeat3 0   3
4               
5   Emo/3   repeat1 1   4
6   Emo/4   repeat1 2   2
7   Emo/1   repeat1 3   3
8   Emo/3   repeat1 4   1
9               
10  Neu/5   repeat2 4   4
11  Neu/2   repeat2 2   2
12  Neu/5   repeat2 3   3
13  Neu/2   repeat2 1   1

我想在B列和c列的基础上创建一个D列。如果满足条件,则填上一个数字,如果不满足则留空。下面是我的代码:

conditions = [

df_1.loc[(df_1['B']==1)&(df_1['C']==1)],
df_1.loc[(df_1['B']==2)&(df_1['C']==1)],
df_1.loc[(df_1['B']==3)&(df_1['C']==1)],
]
choices = [1,1,0]
df_1['D'] = np.select(conditions, choices, default='')

您不应该在您的条件中使用.loc。另外,在一个列中混合字符串和数字也不是一个好主意,所以你应该将默认值设置为NaN而不是''

试题:

conditions = [(df_1['B']==1)&(df_1['C']==1),
(df_1['B']==2)&(df_1['C']==1),
(df_1['B']==3)&(df_1['C']==1)]
choices = [1,1,0]
df_1['D'] = np.select(conditions, choices, default=np.nan)
>>> df_1
A      Pos  B  C    D
0   Emo/3  repeat3  0  0  NaN
1   Emo/4  repeat3  0  2  NaN
2   Emo/1  repeat3  0  1  NaN
3   Emo/3  repeat3  0  3  NaN
4                         NaN
5   Emo/3  repeat1  1  4  NaN
6   Emo/4  repeat1  2  2  NaN
7   Emo/1  repeat1  3  3  NaN
8   Emo/3  repeat1  4  1  NaN
9                         NaN
10  Neu/5  repeat2  4  4  NaN
11  Neu/2  repeat2  2  2  NaN
12  Neu/5  repeat2  3  3  NaN
13  Neu/2  repeat2  1  1  1.0

最新更新