比较pandas数据框中的两个列表列.从一个列表中删除存在于另一个列表中的值



假设我有如下两个列表列:

group1 = [['John', 'Mark'], ['Ben', 'Johnny'], ['Sarah', 'Daniel']]
group2 = [['Aya', 'Boa'], ['Mab', 'Johnny'], ['Sarah', 'Peter']]
df = pd.DataFrame({'group1':group1, 'group2':group2})

我想比较两个列表列,并从group1中删除group2中存在的列表元素。所以上面的预期结果:

group1                       group2
['John', 'Mark']             ['Aya', 'Boa']
['Ben']                     ['Mab', 'Johnny']
['Daniel']                  ['Sarah', 'Peter']

我该怎么做?我试过了:

df['group1'] = [[name for name in df['group1'] if name not in df['group2']]]

But got error:

TypeError: unhashable type: 'list'

请帮助。

需要对两个系列进行zip。我在这里使用set是为了提高效率(如果每个列表只有几个项目,这并不重要):

df['group1'] = [[x for x in a if x not in S]
for a, S in zip(df['group1'], df['group2'].apply(set))]

输出:

group1          group2
0  [John, Mark]      [Aya, Boa]
1         [Ben]   [Mab, Johnny]
2      [Daniel]  [Sarah, Peter]

可以使用set difference:

df.apply(lambda x: set(x['group1']).difference(x['group2']), axis=1)

输出:

0    {John, Mark}
1           {Ben}
2        {Daniel}
dtype: object

要获得列表,您可以在末尾添加.apply(list)

你可以在lambda函数中使用循环:

df['group1']=df[['group1','group2']].apply(lambda x: [i for i in x['group1'] if i not in x['group2']],axis=1)
print(df)
'''
group1          group2
0  [John, Mark]      [Aya, Boa]
1         [Ben]   [Mab, Johnny]
2      [Daniel]  [Sarah, Peter]
'''

最新更新