如果列表中的字符串包含 Pandas 数据帧列中的子字符串,则如何替换该字符串



我有一个df:

df = pd.DataFrame({'age': [13,62,53, 33],
'gender': ['male','female','male', 'male'],
'symptoms': [['acute respiratory distress', 'fever'],
['acute respiratory disease', 'cough'],
['fever'],
['respiratory distress']]})

df

输出:

age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory disease, cough]
2       23      male    [fever]
3       33      male    [respiratory distress]

我正在尝试替换"症状"列(在本例中为列表(中包含子字符串"呼吸"的所有值实例,并将该列表中的整个值更改为"急性呼吸窘迫",以便在整个数据框中保持一致。这是期望的结果:

Output:
age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory distress, cough]
2       23      male    [fever]
3       33      male    [acute respiratory distress]

我试过:

df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory 
distress'
print(df)

但是,数据框仍保持原样。

像这样:

import pandas as pd
df = pd.DataFrame({'age': [13,62,53, 33],
'gender': ['male','female','male', 'male'],
'symptoms': [['acute respiratory distress', 'fever'],
['acute respiratory disease', 'cough'],
['fever'],
['respiratory distress']]})
df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]

print(df)

输出:

age  gender                            symptoms
0   13    male  [acute respiratory disease, fever]
1   62  female  [acute respiratory disease, cough]
2   53    male                             [fever]
3   33    male         [acute respiratory disease]

加入explode,然后使用contains分配

>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
age  gender                             symptoms
0   13    male  [acute respiratory distress, fever]
1   62  female  [acute respiratory distress, cough]
2   53    male                              [fever]
3   33    male         [acute respiratory distress]

最新更新