检查列值是否在列表中,并向新列报告



经过讨论,我有以下数据框架:

data = {'Item':['1', '2', '3', '4', '5'], 
'Len':[142, 11, 50, 60, 12], 
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}
df = pd.DataFrame(data)

表示不同的项以及与它们的一些参数(Le, Hei, C,…)相关的相应值。在Fail列中报告失败的参数,例如item 1参数Len和Thick失败,item 3参数B、Thick和Vol失败,item 4没有失败。对于每个项目,我需要一个新列,其中报告失败的参数及其值,格式如下:failed parameter = value。因此,对于第一项,我应该得到Len=142和Thick=60。到目前为止,我已经将Fail列分解为多个列:

failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' + str(x +1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)

然后,如果我做以下操作:

for name in df2_list:
df2.loc[df2[f"{name}"] == "D", "new"] = "D"+ "=" + df2["D"].map(str)

我可以得到我需要的,但只有一个参数(在这种情况下是D)。我怎样才能一次得到所有参数的相同值呢?

正如问题中提到的,您需要插入一个包含字符串列表的新列(例如,FailParams)。每个字符串表示项目的失败(例如,Len=142,Thick=60)。一个快速的解决方案可以是:

import pandas as pd
data = {
'Item' : ['1', '2', '3', '4', '5'],
'Len'  : [142, 11, 50, 60, 12],
'Hei'  : [55, 65, 130, 14, 69],
'C'    : [68, -18, 65, 16, 17],
'Thick': [60, 0, -150, 170, 130],
'Vol'  : [230, 200, -500, 10, 160],
'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}
# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)
# The first solution: using list comprehension.
column = [
",".join(  # Add commas between the list items.
# Find the target items and their values.
[el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
)
if (len(L[1]) > 0) else ""  # If the Fail inner is empty, return an empty string.
for L in zip(df['Item'].values, df['Fail'].values)  # Loop on the Fail items.
]
# Insert the new column.
df['FailParams'] = column
# Print the DF after insertion.
print(df)

前面的解决方案是使用列表推导式添加的。另一个使用循环的解决方案是:

# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
if (len(L[1]) <= 0):
record = ""
else:
record = ",".join([el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
records.append(record)
print(records)
# Insert the new column.
df['FailParams'] = records
# Print the DF after insertion.
print(df)

一个示例输出应该是:

Item  Len  Hei   C  Thick  Vol               Fail                   FailParams
0    1  142   55  68     60  230       [Len, Thick]             Len=142,Thick=60
1    2   11   65 -18      0  200            [Thick]                      Thick=0
2    3   50  130  65   -150 -500  [Hei, Thick, Vol]  Hei=130,Thick=-150,Vol=-500
3    4   60   14  16    170   10              [Vol]                       Vol=10
4    5   12   69  17    130  160                 []

首先建立一个中间表示可能是一个好主意,像这样(我假设Fail列中的空单元格是一个空列表[],以便与其他值的数据类型相匹配):

# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail), 
axis=1, 
result_type='broadcast')
>>> df[m]
Item    Len    Hei   C  Thick    Vol Fail
0  NaN  142.0    NaN NaN   60.0    NaN  NaN
1  NaN    NaN    NaN NaN    0.0    NaN  NaN
2  NaN    NaN  130.0 NaN -150.0 -500.0  NaN
3  NaN    NaN    NaN NaN    NaN   10.0  NaN
4  NaN    NaN    NaN NaN    NaN    NaN  NaN

这也允许你对失败的值做一些事情。

有了这个,生成值列表可以通过类似于Hossam Magdy Balaha的答案来完成,也许是一个小函数:

def join_params(row):
row = row.dropna().to_dict()
return ', '.join(f'{k}={v}' for k,v in row.items())
>>> df[m].apply(join_params, axis=1)
0                  Len=142.0, Thick=60.0
1                              Thick=0.0
2    Hei=130.0, Thick=-150.0, Vol=-500.0
3                               Vol=10.0
4                                       
dtype: object

相关内容

  • 没有找到相关文章

最新更新