基于多个条件和多列熊猫创建新列



考虑到下面这些列旁边还有另一列,我想通过这3列创建一个新列,定义每行的最终状态。

status_1                        status_2       status_3
a_accepted_with_comment         a_revised     c_approved
a_accepted_with_comment         c_rejected       nan
a_rejected                      a_approved       nan
a_rejected                         nan           nan

从3列中,如果具有值的最后一列显示c_approved,则新列将给出已批准的

从3列中,如果具有值的最后一列显示c_rejected,则新列将给出rejected

从第3列中,如果最后一个具有值的列显示a_approved,则新列将给出修订后的

从第3列中,如果最后一个具有值的列显示a_rejected,则新列将给出被拒绝的

最后的表格如下:

status_1                        status_2       status_3       final_status
a_accepted_with _comment         a_revised     c_approved       approved
a_accepted_with_comment         c_rejected       nan           rejected
b_rejected                      a_approved       nan           revised
a_rejected                       nan             nan           rejected                            

如何在python中创建这样一个具有多个条件的新列?

提前谢谢。

您可以使用ffillmap来跟踪每个标准及其结果。

response_rules = {
"c_approved": "approved",
"c_rejected": "rejected",
"a_approved": "revised",
"a_rejected": "rejected"
}
df["final_status"] = df.ffill(axis=1)["status_3"].map(response_rules)
print(df)
status_1    status_2    status_3 final_status
0  a_accepted_with_comment   a_revised  c_approved     approved
1  a_accepted_with_comment  c_rejected         NaN     rejected
2               a_rejected  a_approved         NaN      revised
3               a_rejected         NaN         NaN     rejected

如果你有很多规则,一个更好的设计模式可能是保留一个易于阅读/编辑的字典,将结果映射到每个标准,然后在调用.map之前将其反转

response_rules = {
"approved": ["c_approved"],
"rejected": ["c_rejected", "a_rejected"],
"revised": ["a_approved"]
}
# invert dictionary
inverted_rules = {vv: k for k, v in response_rules.items() for vv in v}
# same as before
df["final_status"] = df.ffill(axis=1)["status_3"].map(inverted_rules)
print(df)
status_1    status_2    status_3 final_status
0  a_accepted_with_comment   a_revised  c_approved     approved
1  a_accepted_with_comment  c_rejected         NaN     rejected
2               a_rejected  a_approved         NaN      revised
3               a_rejected         NaN         NaN     rejected

# Just so you can see:
print(inverted_rules) 
{'a_approved': 'revised',
'a_rejected': 'rejected',
'c_approved': 'approved',
'c_rejected': 'rejected'}

让我们尝试使用np.selectffill

s = df.ffill(1).iloc[:,-1]
c1 = s=='c_approved'
c2 = s.isin(['c_rejected','a_rejected'])
c3 = s=='a_approved'
df['new'] = np.select([c1,c2,c3],['approve','rejected','revised'])
df
Out[210]: 
status_1    status_2    status_3       new
0  a_accepted_with_comment   a_revised  c_approved   approve
1  a_accepted_with_comment  c_rejected         NaN  rejected
2               a_rejected  a_approved         NaN   revised
3               a_rejected         NaN         NaN  rejected

最新更新