pandas:字符串列表的两列之间的输出差异



我有一个包含两列的数据帧,如下所示:

df = pd.DataFrame({'pos_1':[['VERB', 'PRON', 'DET', 'NOUN', 'ADP'],['NOUN', 'PRON', 'DET', 'NOUN', 'ADV', 'ADV']],
'pos:2':[['VERB', 'PRON', 'DET', 'NOUN', 'ADP'],['VERB', 'PRON', 'DET', 'NOUN', 'ADV', 'ADV']]})

我正在尝试使用apply输出这两列之间的差异。

df['diff'] = df.apply(lambda x: [i for i in x['pos_1'] if i not in x['pos_2']], axis=1)

我希望diff列的输出应该是:

diff
1 []
2 ['NOUN','VERB']

但我在diff列中得到了两个空列表。我不知道我做错了的哪个部分

如果需要按元素比较列表和返回差异,请使用zip比较每对,最后通过嵌套列表理解将其压平:

f = lambda x: [z for i, j in zip(x['pos_1'],x['pos_2']) if i != j for z in [i, j]]
df['diff'] = df.apply(f, axis=1)
print (df)
pos_1                              pos_2  
0       [VERB, PRON, DET, NOUN, ADP]       [VERB, PRON, DET, NOUN, ADP]   
1  [NOUN, PRON, DET, NOUN, ADV, ADV]  [VERB, PRON, DET, NOUN, ADV, ADV]   
diff  
0            []  
1  [NOUN, VERB]  

最新更新