我有一个数据帧,它有一个列名"活动";。
如果该列中的值"包含";"概述";以及";PHD";,然后创建一个";学位;列,填写";PHD";,
如果";"概述";以及";BS";,并填写";BS";,
如果";"概述";以及";MS";,并填写";MS";,
其他np.nan
我知道我的代码不对,但它可能会让你对我想做的事情有一些了解:
Campaign_Degree = []
for x in data['Campaign']:
if x.str.contains('General' and 'PHD'):
data['Campaign_Degree'] == 'PHD'
if x.str.contains('General' and 'BS'):
data['Campaign_Degree'] == 'BS'
if x.str.contains('General' and 'MS'):
data['Campaign_Degree'] == 'MS'
else:
data['Campaign_Degree'] == np.nan
试试这样的东西:
df = pd.DataFrame({'col':["General & PHD",
"General & BS",
"General & MS",
"other",
"Brand_ABC_PHD",
"asGeneraldasPHDasda",
"BSasdasdGeneralppp"]})
df["new"] = df.col[df.col.str.contains("General")].str.extract("(PHD|BS|MS)")
输出
df
col new
0 General & PHD PHD
1 General & BS BS
2 General & MS MS
3 other NaN
4 Brand_ABC_PHD NaN
5 asGeneraldasPHDasda PHD
6 BSasdasdGeneralppp BS