我有一个熊猫数据帧,每列都有字符串值。我想将第 1 列和第 2 列合并为一个新列,假设第 4 列。但是,如果第 1 列和第 2 列中的单词相同,我想将第 1 列和第 3 列合并到新列中。
我试图先将配对放在列表中,然后再将其作为单独的列放置,但是没有成功。我是 python 的新手,所以我认为我错过了一个更简单的解决方案。
pairs = []
for row in df['interest1']:
if row == df['interest2'].iloc[row]:
pairs.append(df['interest1'] + ' ' + df['interest2'])
else:
pairs.append(df['interest1'] + ' ' + df['interest3'])
#a simple example of what I would like to achieve
import pandas as pd
lst= [['music','music','film','music film'],
['guitar','piano','violin','guitar piano'],
['music','photography','photography','music photography'],
]
df= pd.DataFrame(lst,columns=['interest1','interest2','interest3','first distinct pair'])
df
你可以对熊猫数据帧使用 where
方法,
df['first_distinct_pair'] = (df['interest1'] + df['interest2']).where(df['interest1'] != df['interest2'], df['interest1'] + df['interest3'])
如果你想包含空格,你可以做:
df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'], df['interest1'] + ' ' + df['interest3'])
结果是这样的:
import pandas as pd
...:
...: lst= [['music','music','film'],
...: ['guitar','piano','violin'],
...: ['music','photography','photography'],
...: ]
...:
...: df= pd.DataFrame(lst,columns=['interest1','interest2','interest3'])
>>> df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'], df['interest1'] + ' ' + df['interest3'])
>>> df
interest1 interest2 interest3 first_distinct_pair
0 music music film music film
1 guitar piano violin guitar piano
2 music photography photography music photography