经过讨论,我有以下数据框架:
data = {'Item':['1', '2', '3', '4', '5'],
'Len':[142, 11, 50, 60, 12],
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}
df = pd.DataFrame(data)
表示不同的项以及与它们的一些参数(Le, Hei, C,…)相关的相应值。在Fail列中报告失败的参数,例如item 1参数Len和Thick失败,item 3参数B、Thick和Vol失败,item 4没有失败。对于每个项目,我需要一个新列,其中报告失败的参数及其值,格式如下:failed parameter = value。因此,对于第一项,我应该得到Len=142和Thick=60。到目前为止,我已经将Fail列分解为多个列:
failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' + str(x +1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)
然后,如果我做以下操作:
for name in df2_list:
df2.loc[df2[f"{name}"] == "D", "new"] = "D"+ "=" + df2["D"].map(str)
我可以得到我需要的,但只有一个参数(在这种情况下是D)。我怎样才能一次得到所有参数的相同值呢?
正如问题中提到的,您需要插入一个包含字符串列表的新列(例如,FailParams
)。每个字符串表示项目的失败(例如,Len=142,Thick=60
)。一个快速的解决方案可以是:
import pandas as pd
data = {
'Item' : ['1', '2', '3', '4', '5'],
'Len' : [142, 11, 50, 60, 12],
'Hei' : [55, 65, 130, 14, 69],
'C' : [68, -18, 65, 16, 17],
'Thick': [60, 0, -150, 170, 130],
'Vol' : [230, 200, -500, 10, 160],
'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}
# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)
# The first solution: using list comprehension.
column = [
",".join( # Add commas between the list items.
# Find the target items and their values.
[el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
)
if (len(L[1]) > 0) else "" # If the Fail inner is empty, return an empty string.
for L in zip(df['Item'].values, df['Fail'].values) # Loop on the Fail items.
]
# Insert the new column.
df['FailParams'] = column
# Print the DF after insertion.
print(df)
前面的解决方案是使用列表推导式添加的。另一个使用循环的解决方案是:
# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
if (len(L[1]) <= 0):
record = ""
else:
record = ",".join([el + "=" + str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
records.append(record)
print(records)
# Insert the new column.
df['FailParams'] = records
# Print the DF after insertion.
print(df)
一个示例输出应该是:
Item Len Hei C Thick Vol Fail FailParams
0 1 142 55 68 60 230 [Len, Thick] Len=142,Thick=60
1 2 11 65 -18 0 200 [Thick] Thick=0
2 3 50 130 65 -150 -500 [Hei, Thick, Vol] Hei=130,Thick=-150,Vol=-500
3 4 60 14 16 170 10 [Vol] Vol=10
4 5 12 69 17 130 160 []
首先建立一个中间表示可能是一个好主意,像这样(我假设Fail
列中的空单元格是一个空列表[]
,以便与其他值的数据类型相匹配):
# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail),
axis=1,
result_type='broadcast')
>>> df[m]
Item Len Hei C Thick Vol Fail
0 NaN 142.0 NaN NaN 60.0 NaN NaN
1 NaN NaN NaN NaN 0.0 NaN NaN
2 NaN NaN 130.0 NaN -150.0 -500.0 NaN
3 NaN NaN NaN NaN NaN 10.0 NaN
4 NaN NaN NaN NaN NaN NaN NaN
这也允许你对失败的值做一些事情。
有了这个,生成值列表可以通过类似于Hossam Magdy Balaha的答案来完成,也许是一个小函数:
def join_params(row):
row = row.dropna().to_dict()
return ', '.join(f'{k}={v}' for k,v in row.items())
>>> df[m].apply(join_params, axis=1)
0 Len=142.0, Thick=60.0
1 Thick=0.0
2 Hei=130.0, Thick=-150.0, Vol=-500.0
3 Vol=10.0
4
dtype: object