根据特定条件将熊猫中的 2 个字符串列组合成一个新列的最佳方法是什么

我有一个熊猫数据帧，每列都有字符串值。我想将第 1 列和第 2 列合并为一个新列，假设第 4 列。但是，如果第 1 列和第 2 列中的单词相同，我想将第 1 列和第 3 列合并到新列中。

我试图先将配对放在列表中，然后再将其作为单独的列放置，但是没有成功。我是 python 的新手，所以我认为我错过了一个更简单的解决方案。

pairs = []
for row in df['interest1']:
    if row == df['interest2'].iloc[row]:
        pairs.append(df['interest1'] + ' ' + df['interest2'])
    else:
        pairs.append(df['interest1'] + ' ' + df['interest3'])

#a simple example of what I would like to achieve
import pandas as pd
lst= [['music','music','film','music film'],
      ['guitar','piano','violin','guitar piano'],
      ['music','photography','photography','music photography'],
     ]
df= pd.DataFrame(lst,columns=['interest1','interest2','interest3','first distinct pair'])
df

你可以对熊猫数据帧使用 where 方法，

df['first_distinct_pair'] = (df['interest1'] + df['interest2']).where(df['interest1'] != df['interest2'],  df['interest1'] + df['interest3'])

如果你想包含空格，你可以做：

df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'],  df['interest1'] + ' ' + df['interest3'])

结果是这样的：

 import pandas as pd
      ...: 
      ...: lst= [['music','music','film'],
      ...:       ['guitar','piano','violin'],
      ...:       ['music','photography','photography'],
      ...:      ]
      ...: 
      ...: df= pd.DataFrame(lst,columns=['interest1','interest2','interest3'])
>>> df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'],  df['interest1'] + ' ' + df['interest3'])
>>> df
  interest1    interest2    interest3 first_distinct_pair
0     music        music         film          music film
1    guitar        piano       violin        guitar piano
2     music  photography  photography   music photography

相关内容

最新更新

热门标签：