我有一个数据帧,我在上面应用"if"条件来返回某些值。我想创建一个具有这些值的新列,但在满足多个条件的情况下,我希望该列中的所有"返回"值都是
对于以下数据帧,例如
sample = pd.DataFrame({'Status':('reliable','non-reliable','reliable','non-reliable','reliable','reliable','non-reliable'),
'Gender': ('M','M','F','M','F','M','F'),
'Domain': ('Yes','No','Yes','No','Yes','No','Yes'),
'Paid': ('Paid','Paid','Paid','Not Paid','Paid','Not Paid','Paid')
})
样品条件如下。例如,如果"Status is reliable and Gender is F",则新列应同时具有返回值"reliable True"one_answers"F True">
def sample_column(row):
if ((row['Status'] == 'reliable')):
return 'reliable True'
if ((row['Gender'] == 'F')):
return 'F True'
if ((row['Domain'] == 'Yes')):
return 'Doamin True'
最后构建列
sample = sample.assign(True_cases = sample.apply(sample_column,axis=1))
我在这里找到了一个示例解决方案(但我无法复制(:检查Python中的每一个条件,如果不是,即使其中一个条件评估为真正的
在这方面的任何帮助都将不胜感激。
最简单的方法是生成一个掩码,然后将结果连接到逐行选择上:
conds = {
'Status': 'reliable',
'Gender': 'M',
'Domain': 'Yes',
'Paid': 'Paid'
}
mask = pd.DataFrame().reindex_like(sample)
for c in mask.columns:
mask[c] = sample[c] == conds[c]
sample['True Column'] = [
' '.join([
'{} True'.format(s) for s in sample.loc[i, mask.loc[i]]
]) for i in sample.index
]
我在这里使用了一个相对笨拙的double-for-loop,但您可以将字符串格式封装在函数中以获得更好的性能。结果是:
Domain Gender Paid Status
0 Yes M Paid reliable
1 No M Paid non-reliable
2 Yes F Paid reliable
3 No M Not Paid non-reliable
4 Yes F Paid reliable
5 No M Not Paid reliable
6 Yes F Paid non-reliable
True Column
0 Yes True M True Paid True reliable True
1 M True Paid True
2 Yes True Paid True reliable True
3 M True
4 Yes True Paid True reliable True
5 M True reliable True
6 Yes True Paid True
编辑
我不确定这样做的目的是什么,但Pandas似乎不是输出的最佳工具?IMHO如果你正在寻找人类可读的长字符串,你不应该试图将其放入DataFrame中。
无论如何,如果格式化是可变的,它可以作为我原始解决方案的扩展,通过传递自定义格式化函数:
conds = {
'Status': ('reliable', lambda s: 'The status is {}'.format(s)),
'Gender': ('M', lambda s: 'The gender is {}'.format(s)),
'Domain': ('Yes', lambda s: 'Hello'),
'Paid': ('Paid', lambda s: 'The bill has been settled')
}
mask = pd.DataFrame().reindex_like(sample)
for c in mask.columns:
mask[c] = sample[c] == conds[c][0]
sample['True Column'] = [
' '.join([
conds[c][1](s) for c, s in sample.loc[i, mask.loc[i]].iteritems()
]) for i in sample.index
]
否则,您可以使用您的函数,但只需将每个匹配的语句附加到一个列表中,并在末尾加入它:
def sample_column(row):
ol = []
if ((row['Status'] == 'reliable')):
ol.append('reliable True')
if ((row['Gender'] == 'F')):
ol.append('F True')
if ((row['Domain'] == 'Yes')):
ol.append('Domain True')
return ' '.join(ol)
sample['True Column'] = sample.apply(sample_column,axis=1)
您可以使用numpy的where
函数和&
来链接您的条件
请参阅下面代码中的np.where
:
import pandas as pd
sample = pd.DataFrame({'Status':('reliable','non-reliable','reliable','non-reliable','reliable','reliable','non-reliable'),
'Gender': ('M','M','F','M','F','M','F'),
'Domain': ('Yes','No','Yes','No','Yes','No','Yes'),
'Paid': ('Paid','Paid','Paid','Not Paid','Paid','Not Paid','Paid')
})
import numpy as np
sample['True_Column'] = np.where(
(sample['Status']=='reliable') &
(sample['Gender']=='F') &
(sample['Domain']=='Yes'),
'True', 'False')
print (sample)
# Status Gender Domain Paid True_Column
#0 reliable M Yes Paid False
#1 non-reliable M No Paid False
#2 reliable F Yes Paid True
#3 non-reliable M No Not Paid False
#4 reliable F Yes Paid True
#5 reliable M No Not Paid False
#6 non-reliable F Yes Paid False
对于更复杂的条件,可以使用np.select
我不太确定您确定F True
、reliable True
和Domain True
的逻辑是如何完成的,所以您必须在逻辑中更加具体。