在单独的列中返回所有"if 条件值"



我有一个数据帧,我在上面应用"if"条件来返回某些值。我想创建一个具有这些值的新列,但在满足多个条件的情况下,我希望该列中的所有"返回"值都是

对于以下数据帧,例如

sample = pd.DataFrame({'Status':('reliable','non-reliable','reliable','non-reliable','reliable','reliable','non-reliable'),
'Gender': ('M','M','F','M','F','M','F'),
'Domain': ('Yes','No','Yes','No','Yes','No','Yes'),
'Paid': ('Paid','Paid','Paid','Not Paid','Paid','Not Paid','Paid')
})

样品条件如下。例如,如果"Status is reliable and Gender is F",则新列应同时具有返回值"reliable True"one_answers"F True">

def sample_column(row):
if ((row['Status'] == 'reliable')):
return 'reliable True'
if ((row['Gender'] == 'F')):
return 'F True'
if ((row['Domain'] == 'Yes')):
return 'Doamin True'

最后构建列

sample = sample.assign(True_cases = sample.apply(sample_column,axis=1))

我在这里找到了一个示例解决方案(但我无法复制(:检查Python中的每一个条件,如果不是,即使其中一个条件评估为真正的

在这方面的任何帮助都将不胜感激。

最简单的方法是生成一个掩码,然后将结果连接到逐行选择上:

conds = {
'Status': 'reliable',
'Gender': 'M',
'Domain': 'Yes',
'Paid': 'Paid'
}
mask = pd.DataFrame().reindex_like(sample)
for c in mask.columns:
mask[c] = sample[c] == conds[c]
sample['True Column'] = [
' '.join([
'{} True'.format(s) for s in  sample.loc[i, mask.loc[i]]
]) for i in sample.index
]

我在这里使用了一个相对笨拙的double-for-loop,但您可以将字符串格式封装在函数中以获得更好的性能。结果是:

Domain Gender      Paid        Status  
0    Yes      M      Paid      reliable   
1     No      M      Paid  non-reliable   
2    Yes      F      Paid      reliable   
3     No      M  Not Paid  non-reliable   
4    Yes      F      Paid      reliable   
5     No      M  Not Paid      reliable   
6    Yes      F      Paid  non-reliable   
True Column  
0  Yes True M True Paid True reliable True  
1                         M True Paid True  
2         Yes True Paid True reliable True  
3                                   M True  
4         Yes True Paid True reliable True  
5                     M True reliable True  
6                       Yes True Paid True  

编辑

我不确定这样做的目的是什么,但Pandas似乎不是输出的最佳工具?IMHO如果你正在寻找人类可读的长字符串,你不应该试图将其放入DataFrame中。

无论如何,如果格式化是可变的,它可以作为我原始解决方案的扩展,通过传递自定义格式化函数:

conds = {
'Status': ('reliable', lambda s: 'The status is {}'.format(s)),
'Gender': ('M', lambda s: 'The gender is {}'.format(s)),
'Domain': ('Yes', lambda s: 'Hello'),
'Paid': ('Paid', lambda s: 'The bill has been settled')
}
mask = pd.DataFrame().reindex_like(sample)
for c in mask.columns:
mask[c] = sample[c] == conds[c][0]
sample['True Column'] = [
' '.join([
conds[c][1](s) for c, s in sample.loc[i, mask.loc[i]].iteritems()
]) for i in sample.index
]

否则,您可以使用您的函数,但只需将每个匹配的语句附加到一个列表中,并在末尾加入它:

def sample_column(row):
ol = []
if ((row['Status'] == 'reliable')):
ol.append('reliable True')
if ((row['Gender'] == 'F')):
ol.append('F True')
if ((row['Domain'] == 'Yes')):
ol.append('Domain True')
return ' '.join(ol)
sample['True Column'] = sample.apply(sample_column,axis=1)

您可以使用numpy的where函数和&来链接您的条件
请参阅下面代码中的np.where

import pandas as pd
sample = pd.DataFrame({'Status':('reliable','non-reliable','reliable','non-reliable','reliable','reliable','non-reliable'),
'Gender': ('M','M','F','M','F','M','F'),
'Domain': ('Yes','No','Yes','No','Yes','No','Yes'),
'Paid': ('Paid','Paid','Paid','Not Paid','Paid','Not Paid','Paid')
})
import numpy as np
sample['True_Column'] = np.where( 
(sample['Status']=='reliable') & 
(sample['Gender']=='F') & 
(sample['Domain']=='Yes'), 
'True', 'False')
print (sample)
#         Status Gender Domain      Paid True_Column
#0      reliable      M    Yes      Paid       False
#1  non-reliable      M     No      Paid       False
#2      reliable      F    Yes      Paid        True
#3  non-reliable      M     No  Not Paid       False
#4      reliable      F    Yes      Paid        True
#5      reliable      M     No  Not Paid       False
#6  non-reliable      F    Yes      Paid       False

对于更复杂的条件,可以使用np.select
我不太确定您确定F Truereliable TrueDomain True的逻辑是如何完成的,所以您必须在逻辑中更加具体。

相关内容

  • 没有找到相关文章

最新更新